Introduction and Overview
Little Blue Linux (variously referred to here as LB Linux, LBL, or just Little Blue) is a basic but usable GNU/Linux operating system distribution that includes enough software to rebuild itself from source code. It is most suitable for use on servers rather than desktop or laptop computers, because it lacks a graphical user interface, but it is as flexible and extensible as any other basic GNU/Linux system with C and C++ development tools installed.
Cross-Building Linux, or CBL, is this book. It outlines a step-by-step process you can follow to build the Little Blue Linux system entirely from source code. If you follow it precisely, what you wind up with is a Little Blue Linux system. If you modify the process, what you wind up with will be a variant or derivative of Little Blue Linux — perhaps you’ll like the result enough to give it a name and make it available to other people!
The aspect of all this that I focus the most on — the thing that is most important to me — is the process: the narrative that describes every necessary component of a GNU/Linux system, how they fit together, and how to bootstrap them all, starting from an existing GNU/Linux system, to create a new complete, minimal, self-hosted GNU/Linux system, entirely from source code.
The most important design goal is that the entire process should be as clear and transparent as possible. Ideally, it should be difficult to read through CBL without understanding how the resulting system works and how it was put together. (I realize that’s a pretty lofty objective, but you’ve got to have dreams.)
A secondary goal, almost as important as the first, is that I want to be sure that every piece of the final system was built from source code. That is, I want to be confident that none of the binary code from the initial system — where you start the build — winds up simply being copied to the resulting system. I want to be certain that everything has been rebuilt from the ground up. I’ll talk more about why that matters to me later.
It’s also desirable for the resulting system, Little Blue Linux, to be useful. But that utility is only a tertiary consideration! Mostly what I’m interested in is telling a story about how you can create a GNU/Linux system — the programs and libraries and configuration elements that comprise it — and how all those pieces fit together.
As it turns out, I do find that Little Blue Linux is an outstanding server system: since it has only the software packages that I really need, it has a minimal attack surface, and it’s trivial to rebuild everything with specific compiler optimizations for whatever hardware platform I want to run it on because the CBL process is explicitly designed to be automated.
It should not be a surprise that I find Little Blue Linux to be ideal for my purposes — after all, I’m making all the policy decisions about what components to use and how to configure them! That leads to yet another design goal for CBL: I would like to make it easier for people who want to build their ideal systems to do that — perhaps by starting with CBL and modifying it to suit their own tastes, or perhaps by doing something entirely different.
If you have ideas about what your own ideal computer system should look like, how you want it to work, maybe CBL will serve as a starting point for that. Even if it does not, the most important thing about CBL is that it is a demonstration that there is no deep mystery or ancient magical lore involved in how GNU/Linux systems work — this book is not exactly short, but there is nothing really hard in it. Anyone who wants to build a custom system exactly to their own taste can do it! You just have to do some work.
This section has further discussion of some of the design features and principles of CBL, and describes the high-level build plan.
1. Noteworthy Aspects of Little Blue Linux
Before we get into the details of how to build the system, let’s talk a little bit about what you’ll wind up with once it’s complete. All GNU/Linux systems have a fair amount in common with each other, as well as a handful of differentiating factors; this is a brief overview of what those factors are in LB Linux. All of these are discussed more fully later on in the narrative.
1.1. S6-Based Init System
The init framework — which bootstraps the userspace environment and
manages all the background "daemon" processes that are needed in a
healthy GNU/Linux system — is based on the s6
suite of programs by
Laurent Bercot. This uses a bunch of tiny little programs to manage the
base userspace of the system, rather than a few really big ones — looking at the process table on a basic LBL system, there are 38
s6-related processes running, adding up to a total of 6725 pages of
RAM.[1] In contrast, looking at a systemd-managed system
(booted into single-user mode so that an absolutely minimal number of
programs is running), there are only seven processes running but they
occupy a total of 23,480 pages of memory.
1.2. Package Users
Rather than using a centralized database of installed packages, and
providing specialized package-management tools to query that database to
discover things like what files are part of a package or what package a
specific file belongs to, LBL simply creates a separate user account for
each package. The files installed by each package are owned by the
package-specific user, so standard system utilities can be used to
determine package information. To find out what package is responsible
for a file, for example, you can just use ls -l
; or you can find /
-user …
to list all the files and directories owned by a package.
1.3. Configuration File Version Control
I have, for years, had the habit of maintaining a version control
repository of configuration files for the programs I use a lot — especially programs like bash
and vim
and tmux
, whose
configurations I have extensively modified away from the defaults. A lot
of people I know do this — it’s a good way of setting up a new
fresh-out-of-the-box operating system installation so it’s comfortable
and easy to use.
After I’d been doing that for a while, it occurred to me that exactly
the same considerations apply for system configuration files — things
like the configuration files for sshd
and sudo
and other such
programs. Why not put all of those configuration files in a version
control repository as well, so you have a record of who changed files,
and when, and for what reason?
No reason I can see! And so that’s part of what all Little Blue Linux
systems have: a git repository for tracking changes to system
configuration and policy files, accessed using a cfggit
wrapper
script.
1.4. Modern Versions of Everything
Sometimes, when I’ve been interested in using a program but am using a distribution like Debian or CentOS, I have run up against version constraints on dependencies. The current version of QEMU, say, has a feature I’d like to use, so I try to install it; but then it turns out it needs four or five libraries with a version later than what is provided in the Debian or CentOS repositories, and building modern versions of those libraries reveals other updates that they need — it’s frustrating!
Little Blue Linux is not always completely up-to-date, since new versions of packages are released all the time, but it’s reasonably close. Every few months I go through the packages that make up the base LBL system and update the blueprints to use the latest stable version of everything. So I’ve never run into a situation where I have to upgrade a bunch of other packages to be able to use a modern version of something else.
(To be completely honest, this isn’t as much of a selling point as it could be — because although there are not ancient and obsolete versions of any package in LBL, there are a huge number of packages for which LBL contains no blueprints at all, and you’ll have some work to do to use them at all. But it suits my purposes just fine!)
2. The Cross-Architecture Build Process
The best way I’ve thought of to ensure that the result of the CBL process has really been constructed entirely from source code, eliminating the possibility that any binary code has been simply copied in from the original computer system, is to make sure that none of the code from the host system will actually work on the final CBL system. We do that by starting the CBL build with the construction of a cross-toolchain: a compiler and related tools that run on one type of computer — by convention, this is called the "host" system — but create programs and libraries that will work on some completely different, and incompatible, type of computer architecture: the "target" system.
The CBL process itself breaks down naturally into two different parts: the part that you run on the host system, and the part that you run on the target system. I sometimes refer to those as the two "sides" of the CBL process, the "host side" and the "target side."
The idea of using a cross-toolchain, to start with one kind of computer and use it to build a system that works on a different kind of computer, is so fundamental to CBL that I named the process after it. Within that constraint, though, there is a lot of flexibility in how to perform the CBL build. The host can be a physical machine, like an Intel x86-architecture notebook computer or ARM-architecture chromebook, or it can be a virtual machine emulated by a program like QEMU. There are advantages and disadvantages to both options. The target, similarly, can be a physical computer or a QEMU virtual machine; and, again, there are benefits and drawbacks to both. Any of those combinations can work — you can use a physical computer as the host system, then move all the pieces you built there to a different physical computer to finish the build, or you can use a virtual machine as the host system and a different virtual machine as the target system, or anything else you can think of.
The main benefits of using QEMU — for the host or target or both — are, first, that you don’t need to have a real computer with whatever architecture you want to use for that side of the CBL process; and, second, that it’s possible to automate the entire build process — since you don’t have to move a physical storage device from one computer to another, or press any actual power buttons, or anything like that.
The main disadvantages of using QEMU are that emulated systems are
generally a lot slower than real computer systems, so the build process
takes a long time; and QEMU is sometimes not as stable as a real
computer. When running an ARM64-target emulator on a 64-bit Intel
notebook computer, QEMU sometimes crashes during the final system glib
build, for example. In many cases this appears to be caused by the
limited system resources (especially memory) available on the emulated
system. When using QEMU to emulate a computer (as opposed to using it to
run a virtual machine of the same architecture as the host system), I
primarily emulate ARM systems, since QEMU for ARM can emulate a machine
type with any amount of RAM by using the virt
machine type.
In theory, you can follow the CBL process with any kind of computer as the host or target platform, as long as they are supported by the GNU toolchain programs and the Linux kernel, but it seems as though every different architecture presents idiosyncracies that require additional work to support. That means that if you go outside the host/target pairings that we use for developing and testing CBL, you will probably need to do some additional work.
The physical computer systems that I use are 64-bit x86 (aka Intel- or AMD-architecture) computers, 32- and 64-bit ARM systems, and 32-bit MIPS systems — because those are the types of computers I have handy.
3. About This Specific CBL Build
The canonical form of Cross-Building Linux is a set of "blueprint" files that are available in a publicly-accessible git repository. If you’re reading this as a book or web page, it was produced from those blueprint files by the litbuild program.
Every time litbuild produces the CBL book, it configures it for a specific type of build, with a particular kind of host system and target system, includes instructions on how to launch the target-side build in QEMU if those are relevant, and so on. If this book does not describe the type of build you want to do, all you need to do is obtain the CBL blueprint files and the litbuild program, set some environment variables as described in the Configuration Parameters and Default Values section, and use litbuild to generate a new version of the book.
For this CBL build, the host system is x86_64-unknown-linux-gnu
and the target
system is aarch64-cbl-linux-gnu
. The final system will be called
cbl.lblinux.org
. Most of the work will be done in
the /home/lbl/work/build
directory — so the storage device where that
directory lives should have a few dozen GB of free space.
If any of that looks wrong — or if any of the other parameters in Configuration Parameters and Default Values are not set the way you want them to be — change the configuration and generate a new book!
4. Configuration Parameters and Default Values
4.1. How configuration parameters work
The CBL build can be adjusted and tuned in a variety of ways using the configuration parameters described in this section. To override any parameter from its default value, you can set an environment variable with its name to a new value (or simply modify this blueprint).
The default values here are appropriate for a build taking place on an Intel or AMD 64-bit computer, building a 64-bit ARM target system, using a QEMU-emulated virtual machine for the target system, and possibly using a QEMU virtual machine for the host system as well.
Each section below discusses a different (but related) set of configuration parameters.
4.2. Setting The Host And Target Architectures
These parameters determine what type of build is done and what options are used to control aspects of that build.
- Parameter: HOST
-
Value:
x86_64-unknown-linux-gnu
(default)
As described in the overview, the CBL build process always involves
using a cross-toolchain. The HOST
parameter should be set to the
"triplet" of the system where the host side of CBL (that is, the
cross-toolchain itself and the target-system scaffolding built using it)
will be constructed. You can read more about "triplets" in the
Constructing a GNU Cross-Toolchain section or in the documentation of GNU Autoconf
(as of Autoconf 2.69, triplets are described in section 14).
The easiest way to get the correct value here is simply to run the
script config.guess
, found in the GCC sources, on whatever computer or
virtual machine you’re going to use for the host side of the CBL build.
- Parameter: TARGET
-
Value:
aarch64-cbl-linux-gnu
(default)
TARGET
is the other parameter used to control cross-compilation. It
should be set to the triplet of the target system, whatever computer
or virtual machine will be booted into using the scaffolding. The sample
configuration files provided as part of CBL may be helpful in figuring
out what you should set this to. The second component of the triplet is
conventionally set to cbl
for CBL systems.
- Parameter: TARGET_GCC_CONFIG
-
Value:
--enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-cpu=cortex-a72.cortex-a53
(default)
When configuring GCC for the target system, it may be useful or necessary to specify some set of configuration flags. You can consult the GCC installation instructions and the gcc section for more details about the options available to you. The default shown here is suitable for the Rockchip RK3399 SOC, which has two Cortex-A72 CPU cores and four Cortex-A53 CPU cores in what’s called a "big.LITTLE" architecture. (That default value is used simply because I happen to have such a system.)
- Parameter: HOST_GCC_CONFIG
-
Value: not set (default)
It’s sometimes useful or necessary to specify some set of configuration
flags when configuring GCC for the host system (that is, for the GCC
build done in Trustworthy Host-System Programs, if those are being built). This
works just like TARGET_GCC_CONFIG
, but for the initial native GCC.
- Parameter: TARGET_GMP_CONFIG
-
Value: not set (default)
Similarly to GCC, the GMP library may need to have some extra configuration flags specified — so that it knows what ABI to build for, for example.
- Parameter: KERNEL_ARCH
-
Value:
arm64
(default)
Different packages or programs have different ways of referring to CPU architectures. As mentioned earlier, the GNU toolchain refers to "triplets," which you can read about in Constructing a GNU Cross-Toolchain; the CPU architecture is the first component of the triplet. The Linux kernel has its own naming convention for CPU architectures, which in many (but not all) cases is the same as the CPU field in the triplet.
The default value for the KERNEL_ARCH
parameter is an example where
the naming convention differs. The GNU toolchain refers to the 64-bit
ARM architecture as aarch64
, but the Linux kernel calls it arm64
.
Another example is MIPS. The Linux kernel has a single architecture,
mips
, that is used for all MIPS variants (big endian and little
endian, with both 32-bit and 64-bit word lengths), but each variant has
a different value for the CPU component of the triplet: "mips,"
"mipsel", "mips64," and "mips64el."
- Parameter: TARGET_EXPECTED_MACHINE_NAME
-
Value:
AArch64
(default)
To verify that the cross-toolchain is working as expected, CBL compiles a simple program and then inspects the resulting binary to see whether it is built for the correct kind of target machine. If the binary doesn’t indicate that it’s built for the machine type specified by this parameter, the build will halt so you can inspect the situation and see what’s going on.
- Parameter: KERNEL_CONFIG
-
Value:
defconfig
(default)
The Linux kernel has approximately a jillion [2] different configuration elements. These determine which hardware and features will be supported by linux kernel that results from the build. Starting from scratch isn’t necessarily a good idea; luckily, we don’t have to do that, because the kernel is distributed with a set of default configuration files for every supported CPU architecture. This parameter sets the default configuration file that will be used as a starting point for the CBL build.
This parameter has some relation to the QEMU machine type, when targeting a QEMU emulated machine.
- Parameter: KERNEL_TARGET
-
Value:
Image.gz
(default)
Another element where different CPU architectures are inconsistent with each other is the name of the kernel file that is produced by the build. It is sometimes vmlinux, sometimes vmlinuz, sometimes bzImage, sometimes Image.gz… As far as I can tell, it’s completely at the discretion of whoever is maintaining that architecture within the kernel, and of course everyone has their own preferences.
- Parameter: TARGET_SWAP_DEVICE
-
Value:
/dev/vdb
(default)
If the target-side build has a storage device available for memory
swapping, it can be specified as TARGET_SWAP_DEVICE
and will be used
if such a device is found. If the device doesn’t exist, it won’t cause
any problems, though; and if the target does exist but has a
filesystem or partition table or anything like that on it, it won’t be
touched.
- Parameter: TARGET_SYSTEM_CFLAGS
-
Value:
-O2 -fno-omit-frame-pointer -march=native
(default)
When building the programs and libraries that will comprise the final
system, it is generally desirable to set the CFLAGS
(for C) and
CXXFLAGS
(for C++) environment variables to a common value so
that optimization flags are used consistently across the entire system.
TARGET_SYSTEM_CFLAGS
provides a way to do that.
The presence of the -fno-omit-frame-pointer
flag deserves some
additional comment. A frame pointer is a pointer to a stack frame; if
GCC is told to store frame pointers, it uses a specific CPU register to
store a pointer to the stack frame when making function calls. That
register is then unavailable for other purposes, which can make code
larger and less efficient, but facilitates "unwinding" the stack; this
can be useful when trying to diagnose exceptions. The default behavior
of GCC was to include frame pointers until GCC 8, and then changed to
omit frame pointers.
Since the default build style for CBL targets 64-bit ARM architecture,
it is important to set the default CFLAGS
to include frame pointers:
The AArch64 ABI was designed with the presumption that frame pointers
would always be present. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521 for some detail on
this.
If you’re doing a CBL build for a different target, and you don’t plan
to debug programs using gdb
or something similar, you might want to
omit -fno-omit-frame-pointer
(and possibly add -fomit-frame-pointer
)
so that the resulting programs and libraries are a little bit smaller
and faster.
- Parameter: TARGET_SYSTEM_MAKEFLAGS
-
Value:
-j8
(default)
The make
program is used by most projects to automate their build and
installation processes. Among many other things, make
supports
parallelism in builds: if you are using a computer with multiple CPUs
and sufficient memory to support several simultaneous compiler
processes, you can run make
with a -jN
command line option, or set
the environment variable MAKEFLAGS
to include such an option, and
make
will run up to N
processes concurrently. This can speed up
builds enormously!
For CBL, the degree of parallelism in build processes should not
necessarily be the same on the host system and the target systems,
because they may have very different hardware resources availabler. So
to configure the number of concurrent build processes on the host
system, simply set MAKEFLAGS
as normal when running the host-side
scripts. To configure the number of concurrent build processes on the
target system, use this parameter!
- Parameter: TARGET_ROOTFS_LABEL
-
Value:
lblroot
(default)
This parameter is used as the filesystem label for the target’s root filesystem.
- Parameter: ENABLE_TARGET_NETWORK
-
Value:
true
(default)
The default presumptions made by CBL are that the target system has a
wired ethernet interface, that there is a DHCP server available, and
that networking should be enabled when the target system is booted. If
any of those isn’t the case, set ENABLE_TARGET_NETWORK
to any value
other than true
— in that case, you’ll need to set up networking
yourself after the build completes.
- Parameter: TARGET_BRIDGE
-
Value:
manual
(default:qemu
)
There’s more than one way to get from the host side of the CBL build
process to the target side. Each of these is defined in a blueprint
named target-bridge-NAME
(where NAME
can be whatever you like); the
one that is used for a particular CBL build is the one named in this
parameter.
The default CBL process uses the real host computer system for the first
part of the build, and an emulated QEMU virtual machine for the target.
That’s what the qemu
target bridge does. Another option is to use QEMU
virtual machines for both the host and target systems; the
automated-qemu-to-qemu-build
blueprint describes one way to do that.
4.3. Target System QEMU-related Parameters
Different parameters are required depending on whether the target system is a real computer or a virtual machine. The parameters in this section are used whenever the target system is a QEMU virtual machine.
If you’re not using a QEMU virtual machine for the target, most of these
are irrelevant — with one exception, TARGET_QEMU_ARCH
, as noted in
its description.
- Parameter: TARGET_QEMU_ARCH
-
Value:
aarch64
(default)
The most fundamental of the QEMU-related parameters is QEMU’s name for the target CPU architecture. Like the Linux kernel, the way that QEMU refers to machine architectures is often the same as the CPU field in the triplet — this is the case for 64-bit ARM machines, which both the GNU toolchain and QEMU call "aarch64" (for "ARM Architecture, 64-bit").
Even when the target system is going to be a physical computer rather
than an emulated one, it’s important to specify the correct value for
TARGET_QEMU_ARCH
— QEMU is always used to validate that the
cross-toolchain works properly.
- Parameter: TARGET_QEMU_MACHINE
-
Value:
virt
(default)
For most architectures, QEMU can emulate a variety of different
machines. This parameter lets you select from those. This selection
relates to the kernel configuration you start with, which is specified
with the KERNEL_CONFIG
parameter.
The best documentation for what machine types are supported for
different architectures is on the QEMU wiki:
https://wiki.qemu.org/Documentation/Platforms is the top-level URL as of
October 2019. You can also run the QEMU full-system-emulator program
(like qemu-system-x86_64
or qemu-system-aarch64
or whatever) with
the argument -machine help
to get a list of the machines it supports.
- Parameter: TARGET_QEMU_CPU
-
Value:
cortex-a57
(default)
Similarly to the machine
argument, QEMU can emulate a variety of CPUs;
and you can get a list of the options here with -cpu help
. In many
cases you don’t really need to specify a CPU because there will be a
default value that works fine, but this is not the case with the virt
machine that is used in the default configuration.
If the target QEMU system will be run as a native virtual machine rather
than an emulator — that is, if the actual computer is an x86_64
machine, and you’re doing a build in an x86_64 virtual machine so you
can use the computer for other things while the build is running — you
can specify -cpu host
to tell QEMU not to emulate a processor at all,
and simply act as a hypervisor.
- Parameter: TARGET_QEMU_CPUCOUNT
-
Value:
8
(default)
The QEMU full-system emulators can provide multiple CPU cores to the
guest virtual machine. This may or may not actually be helpful in terms
of performance — historically, this has only been helpful when QEMU is
running as a hypervisor, not emulating a different machine architecture,
because the TCG code generator that converts guest CPU instructions into
host system instructions (I think it stands for "Tiny Code Generator")
only ever operated in a single thread. Recent versions of QEMU have
supported multi-threaded code generation (the "MTTCG" feature) for some
architectures, which provides real parallelism within emulated machines.
This speeds up builds dramatically for some packages, up to the limit of
parallelism that QEMU emulated machines will accept. As of QEMU 4.1.0,
the virt
machine supported by the ARM emulator will accept up to eight
CPU cores, so that’s the value used by default.
You should set the level of parallelism used by make
on the target
sysetm — that’s the TARGET_SYSTEM_MAKEFLAGS
parameter — to be the
same as the number of CPU cores provided to the target virtual machine
here.
Obviously, it’s a bad idea to set this parameter higher than the number of CPU cores that the host system actually has!
- Parameter: TARGET_QEMU_RAM_MB
-
Value:
32768
(default:8192
)
QEMU allows you to define the amount of RAM that will be made available to a virtual machine. The CBL process puts a fair amount of stress on the target system, and the amount of memory available to the compiler — especially for pass[C++] builds — has a huge impact on the reliability of the process as a whole. The default value of 8 GiB works pretty well, but when the host system has more memory I always give the target as much as I can, up to about 24 GiB.
The virt
machine type, available when using the ARM emulators, allows
as much memory as you’d like to allocate.
- Parameter: TARGET_QEMU_DRIVE_PREFIX
-
Value:
vd
(default)
The emulated hardware in QEMU virtual machines is not the same for all
architectures and machines. Depending on what hardware is emulated,
storage devices might show up as sd
(SCSI) devices, hd
(IDE or
ATAPI) devices, vd
(virtual) devices, or possibly something else
entirely. This driver-defined prefix must be used when launching QEMU so
that the Linux kernel can find the root filesystem, and is also used
in QEMU builds when creating partition tables and filesystems.
The way drive prefixes are used in CBL correspond to a historical
convention for the way that device files have been named: for SCSI
disks, for example, the disks are referred to as /dev/sda
, /dev/sdb
,
and so on; partitions on the disks are referenced as /dev/sda1
,
/dev/sda2
, etc. This convention is not always used, though; the
convention for NVMe storage is for the devices to be named
/dev/nvme0n1
, /dev/nvme1n1
, and so on; and for partitions on those
devices to be /dev/nvme0n1p1
, /dev/nvem0n1p2
, and so on.
That means that if you’re doing a CBL build using NVMe storage, or
some other type of storage that uses a different naming convention than
the historical one, you’ll need to tweak the blueprints that use
QEMU_DRIVE_PREFIX
parameters.
- Parameter: TARGET_SERIAL_DEV
-
Value:
ttyAMA0
(default)
When building the target system in a QEMU virtual machine, the normal
graphics console provided by QEMU is not used. Instead, we take
advantage of QEMU’s ability to map the standard input and standard
output of the virtual machine process to a simulated serial device.
The first serial device on most Linux systems is /dev/ttyS0
, but
for ARM computers it might show up as /dev/ttyAMA0
instead.
4.4. Host System QEMU-related Parameters
It’s possible that the host system — perhaps in addition to the target
system — will be a QEMU virtual machine. As mentioned previously, this
is the case when using the automated-qemu-to-qemu-build
blueprint. In
that case we need to specify additional parameters that will control how
QEMU is run for the host system.
As with the target QEMU parameters, if you’re not using a QEMU virtual machine for the host, these directives are irrelevant and you can ignore them.
4.5. Target Boot Configuration
Similarly to TARGET_BRIDGE
, there’s more than one way to make a
GNU/Linux system bootable, and so there are multiple blueprints for
doing that. These parameters are used to select which blueprint to use,
and (for those that need additional configuration parameters) configure
how it should work.
- Parameter: BOOTLOADER
-
Value:
manual
(default)
The BOOTLOADER
parameter selects the blueprint that will be used to
set up the boot loader for the target system. The actual blueprint that
will be used for this is setup-bootloader-manual
.
As with the target bridge, the manual
option means you’re on your own
when it comes to making the target system bootable — the
setup-bootloader-manual
blueprint does not do anything.
- Parameter: BOOT_DEVICE
-
Value: not set (default)
If a boot loader is being installed, it’s a good idea to set BOOT_DEVICE to the name of the block special device that will be used by the boot loader. For example, the GRUB boot loader is typically installed on the first sector (also known as the "boot sector") of a storage device, where it can be found and loaded by the BIOS.
4.6. Target system name and login details
A non-root user account is always created on a CBL system. These parameters control the details that will be used for that account. It’s almost certainly a good idea to override these parameters with values you prefer!
- Parameter: LOGIN
-
Value:
lbl
(default)
This parameter controls the login name for the non-root user. It’s a
good idea to change this to your preferred login name. (The parameter
name USER
would be more idiomatic, but litbuild uses environment
variables to override the default value for configuration parameters,
and the bash
shell always sets USER
to the current user name. Using
USER
here would conflict with that behavior of the bash
shell.)
- Parameter: LOGIN_FULL_NAME
-
Value:
A Little Blue User
(default)
The UNIX user database has a "comment" field that, for accounts used by actual human users, is conventionally used for the full name of the user. Again, it’s a good idea to change this to your own full name.
- Parameter: DOMAIN_NAME
-
Value:
lblinux.org
(default:example.org
)
A domain name — preferably one that you control, defaulted here to
example.org
— will be used in various places throughout system setup
and configuration.
- Parameter: HOST_NAME
-
Value:
cbl
(default)
The final target system will set its hostname to whatever is specified
by this parameter (in the DOMAIN_NAME
domain).
4.7. Directories Used For the Build Process
The rest of the parameters all govern where different parts of the build artifacts will reside. You can set these however you like.
- Parameter: QEMU_IMG_DIR
-
Value:
/home/lbl/work
(default:/tmp/cblwork
)
When targeting a virtual machine, the disk image files used by the QEMU emulator will be created in this directory.
- Parameter: CROSSTOOLS
-
Value:
/home/lbl/work/crosstools
(default:/tmp/cblwork/cross-tools
)
This sets the directory into which the cross-toolchain will be installed.
- Parameter: HOSTTOOLS
-
Value:
/usr
(default)
This sets the directory into which the Trustworthy Host-System Programs will be
installed, if they are being built. If these are not needed, this can be
left at the default value of /usr
— or, if the host system has QEMU
installed in a different location, whatever location that is.
- Parameter: SYSROOT
-
Value:
/home/lbl/work/sysroot
(default:/tmp/cblwork/sysroot
)
The "sysroot" framework is used for the cross-toolchain, and will
contain the root filesystem that will be used by the target system. You
can read much more about this in the various GCC sections. Initially,
during the host stage of the CBL build, this will only contain a single
subdirectory, /scaffolding
. Everything else will be created in the
target stage of the build.
- Parameter: TARFILE_DIR
-
Value:
/home/lbl/materials
(default:/tmp/cbl-materials
)
The source code for the software packages that make up the CBL system is
distributed in files created by the tar
program. tar
stands for
"Tape ARchive" — a term left over from bygone days, when magnetic tapes
were the primary storage format used to move large amounts of data from
one system to another. Even though tapes aren’t commonly used any more,
this is still used as the primary distribution format for source code on
UNIX-ish systems.
This parameter sets the location where CBL will look for all of the source packages needed during the build.
- Parameter: PATCH_DIR
-
Value:
/home/lbl/materials
(default:/tmp/cbl-materials
)
Sometimes, the source code package that is distributed by a project
needs to be modified or adjusted before it is built. This is generally
done using the patch
utility, and the files that contain descriptions
of the modifications that need to be made are called "patch files."
This parameter sets the location where CBL will look for all of the patch files needed during the build.
- Parameter: WORK_SITE
-
Value:
/home/lbl/work/build
(default:/tmp/build
)
When building software from source code, you need to unpack the source
code somewhere and then configure, build, and (sometimes) test it before
installing it to its final destination directory. The WORK_SITE
parameter specifies where all that activity will be done — the name
comes from the construction metaphor that litbuild uses. It should be
considered a transient or temporary directory, and can be deleted after
the build is complete. The full CBL process can use twenty or thirty
gigabytes of storage space; to be on the safe side, define WORK_SITE
as some location with at least that much free space available.
- Parameter: LOGFILE_DIR
-
Value:
/home/lbl/work/logs
(default:/tmp/cblwork/logs
)
Everything printed to standard output and standard error throughout the CBL build process will be written to log files; if something goes wrong, the primary way to figure out what happened is to look at the log files.
This parameter specifies the location where log files are written during
the host side of the CBL process, and the first part of the target side.
(Once the package-users package is installed, log files are written to
the logs
subdirectory of the package users' home directories.)
- Parameter: SCRIPT_DIR
-
Value:
/home/lbl/work/scripts
(default:/tmp/build/scripts
)
SCRIPT_DIR
specifies the location where litbuild will write the bash
scripts that automate the build story — the "tangle" side of the
literate build system.
- Parameter: DOCUMENT_DIR
-
Value:
/home/lbl/work/docs
(default:/tmp/build/docs
)
DOCUMENT_DIR
specifies the location where litbuild will write an
AsciiDoc document that tells the build story — the "weave" side of the
literate build system.
5. An Overview of Package Setup
The GNU system is a collection of packages.
This is a pretty basic concept and if you already understand how these packages work, you should feel free to skip ahead!
When people talk about a Linux system (or, equivalently, a GNU/Linux
sytem), they’re talking about a collection of software packages that
have been assembled in a particular way, using some set of policy
decisions about how to fit those things together. There are some
elements that are common to all of these systems: they use the Linux
kernel to manage hardware resources and provide services to userspace
programs; there is some init
program that runs and sometimes manages
those userspace programs; there is a core set of userspace programs that
you can reasonably expect to find on the system, like the bash
shell
and the core GNU utilities… aside from those basic elements, though,
there is considerable variation in how different systems are set up,
which packages are available, the mechanism used to set up additional
programs, what init
program is used, how the filesystem is arranged,
all can vary widely from one system to another.
Many people and organizations provide easy-to-install distributions (soemtimes called "distros") of those packages, policy decisions, and so on, and this is how the vast preponderance of GNU/Linux systems are set up — it’s so ubiquitous that people talk about these systems in terms of which distribution was installed: RedHat, or Debian, or one of the hundreds of derivatives of those systems, or one of the newer independent distributions like Arch or Void Linux. There are a surprisingly large number of others! You can find a timeline of many of the distributions and the relationships between them on the Internet, perhaps here; if that link doesn’t work, try doing an Internet search for "GNU/Linux distribution timeline." Alternatively, spend some time looking at the https://distrowatch.com/ site — they track releases and other activity on a lot of distributions.
The CBL process defines a GNU/Linux distribution, as well: if you follow the CBL process, you wind up with a Little Blue Linux system. If you modify the CBL process, then you wind up your own distribution, which is a derivative of Little Blue just as Ubuntu is a derivative of Debian GNU/Linux.
But I digress. All of these systems are, primarily, a collection of software packages, each of which is maintained by and released by a person or project team. Generally, these packages are released by their respective project teams as tar archive files containing the package source code and other files. Most of the work that goes into making a GNU/Linux distribution is in taking those release files and setting them up as part of the system, then repackaging the result so that it’s easy for users of the distribution to set it up as well.
That process — setting up a new package so you can use it on your system — generally consists of four steps:
-
You configure the package for your specific system, with some set of configuration settings to control how it will be built and where its files will eventually wind up;
-
then you compile the package source code into executable programs;
-
then you run a suite of automated tests to verify that the program was built successfully; and, finally,
-
you install the package files into the system so that it is available for use.
Sometimes one or more of these stages is missing for a package — for example, a program may not have a test suite, or a package might be written in a language that is primarily interpreted, like perl or python, so there may not be a compile step — but that sequence of build stages is common enough that these instructions always frame the process of setting up packages in terms of those steps.
5.1. The GNU Build System
Many of the packages that constitute the basic Little Blue GNU/Linux
system — including almost all of the most fundamental components, like
the C library and software construction tools — are part of the GNU
system created and maintained by the Free Software Foundation. These
packages are generally designed to be constructed using the GNU Build
System, which includes the autotools for configuration and make
for
running all the commands that actually compile, test, and install the
package.
So many of the packages built during the CBL process use this build system that it forms the default sequence of steps used to set up software packages. If you look at the source blueprints that define the CBL process, you may notice that package blueprints often omit the commands used to set up the package. That’s because the default sequence of steps can be used for them:
-
The package is configured using
./configure --prefix=/usr
, to use default configuration settings for everything except the location where the package files will be installed (the default for this is usually/usr/local
, for reasons we won’t go into here); -
the package is compiled by simply running
make
, which causes the default target to be built; -
the test suite is run with
make check
; and -
the package files are installed with
make install
.
Of course, it’s also common for one or two of those commands to differ from the default set, so sometimes you’ll see that there’s an explicit definition for the configuration commands, or the test commands, or something like that.
5.2. Following the CBL Process Manually
CBL is designed and intended for automation — if you take the source blueprints for CBL, set environment variables for all the configuration parameters you want to override, and run the litbuild program on them, you’ll wind up with a shell script you can run to kick off at least the host side of the build; depending on which "target bridge" you use, the conclusion of that process might automatically kick off the entire second half of the build as well. Automation is ideal for any situation you want to be able to repeat several times without mistakes, which is definitely what I want with CBL!
On the other hand, perhaps you want something different. If, for whatever reason, you prefer to type all the commands yourself, or copy and paste them from a web browser, you can do that. This section has a few tips that might help.
Several parts of CBL set environment variables — those enviroment
variables are all scoped by the section structure of the process, so
(for example) when you start the Constructing a GNU Cross-Toolchain section you
should set all the environment variables defined in that section (and
explicitly unset
the variables that say they should not be defined at
all) before you start building any of the packages there. But when
you’ve finished that part of the process, you should start a new shell
process without those variables set.
Aside from those environment variables, every section in CBL either has some commands to run — which is hopefully pretty straightforward — or sets up a package. Here’s how to set up a package manually:
-
Unpack the source tarfile. The archive file should always unpack into a new directory;
cd
to that directory. -
If the blueprint specifies any In-Tree Packages, you should unpack the source tarfile for each of those packages, and move the resulting directory to the location specified in the blueprint.
-
If the blueprint specifies any Patches, you should apply those to the source tree using
patch -p1
. -
If the blueprint specifies a Build Directory, you should create it and
cd
to it before proceeding. -
Supposing all of that worked without errors, proceed by running all the Configuration commands, then the Compilation commands, then (optionally) the Test commands, and finally the Installation commands.
If anything goes wrong at any stage… well, that’s an example of why it’s a lot of work to create a GNU/Linux distribution. Things break all the time, and you have to spend time and effort figuring out whether it’s a real issue that has to be addressed or an ephemeral problem that will go away if you just restart whatever process failed.
This process is also what the litbuild-generated scripts do automatically, up until the package-users framework is installed. (At that point, the process you should use to build packages manually also changes.)
6. Patches in Cross-Building Linux
In CBL, we strongly prefer to stay as close as possible to the latest stable released version of every package. Sometimes that’s not feasible or practical, though, for various reasons; in case you’re not familiar with the idea of "patch" files, we’ll talk a bit about what we do in those circumstances.
If you know all about patches, you might want to skip ahead!
UNIX systems have, since time immemorial,[3]
included a userspace program called diff
, whose purpose is to find
differences between two files or two directory trees. This is really
handy in all kinds of circumstances, as you can probably imagine! Any
time you want to know what changed in a file, as long as you have a copy
of the original version, you can use diff
to find exactly what lines
are different between the old and new versions. diff
also provides
options to include lines of context around the changes, and… really,
lots of other things; you can read all about its capabilities with man
diff
or info diff
.
Larry Wall, best known for creating the Perl programming language, wrote
a program that is kind of the reciprocal or inverse of diff
: the
patch
program. The idea of patch
is that you you feed it the output
from a diff
command and it applies the changes described there to a
file or a directory tree of files, transforming them into the other
version.
This is really handy when you want to distribute modifications to source
code efficiently! Rather than creating an archive file with the entire
modified source code directory, you can use diff
to capture all of the
differences between the original and modified versions of the source
code; then you can distribute the output of the diff
program however
you need to. Anyone who has both the original version of the source code
and the diff
output can use patch
to reproduce your modified version
of the code.
Since you use the patch
program you use to apply these changes, it’s
common to refer to the output of diff
as "patches" or "patch files,"
and it’s common to refer to the process of applying those files as
"patching."
In CBL, we use a few different types of patches, described below. In
most cases, we consider these patches to be a part of the CBL project
itself, so they are maintained in a git repository you can find at
http://git.freesa.org/freesa/cbl-patches
, as well as being available
in the CBL file repository at http://repo.freesa.org
.
6.1. Miscellaneous tweaks or fixes
Sometimes, packages just don’t work the way that we would like them to.
For example, when cross-compiling the GNU binutils for some host/target
pairs, GCC issues a string truncation warning when compiling
gas/config/tc-i386.c
. This causes the binutils build to crash, since
it’s set to treat all warnings as errors. Modifying a snprintf
call to
use the correct formatting spec (%hhx
rather than %x
) resolves the
problem, but the binutils maintainers don’t want to simply make that
change because of concerns that it might have a negative impact on some
of the (many) architectures that binutils supports.
In this kind of situation, CBL applies a patch to make the necessary change.
6.2. Kernel configurations
The Linux kernel has to be configured to include support for whatever features and hardware device drivers are needed, and omit support for features and device drivers that are not desired. This is a pretty complicated process and can be hard to automate.
The configuration system provides a way to start with a named
configuration rather than the default settings for an architecture; if
this feature is used, a file from arch/*/configs
is used to override
the default settings.
CBL provides kernel configurations for some specific cases — for example, a configuration for kernels intended to run on EC2 instances in the Amazon Web Services cloud. These default configuration files are added to the Linux source tree via patches.
All of this is described more thoroughly in the Linux sections, so you can consult those for more information.
6.3. Gnulib updates
The GNU project maintains a repository of source code that is intended
to be used in other software packages. This repository is called
"Gnulib." However, unlike other packages referred to as "libraries,"
like libxml2
or libgcrypt
, Gnulib is not compiled into a library
of functions that are then dynamically linked into other programs at
runtime. Rather, the expectation is that files from the Gnulib
repository will be copied into the source tree of other projects.
This sometimes presents challenges.
An example of this is release 2.28 of the glibc
package. This release
of glibc
removes some obsolete and deprecated header files — an
example is libio.h
— that are not a part of the C standard library
but were part of earlier glibc
releases. These header files are not
referenced by Gnulib — but until sometime in 2018, they were. Since
other packages, like m4
and gzip
, still include those old versions
of the Gnulib files, those packages won’t build on systems that use
glibc 2.28 or later. At least, not until new versions of those
packages, using modern versions of the Gnulib components, are released!
And some packages are not released very often: the most recent version
of m4
was released nearly two years ago as of this writing.
When I encounter this situation, my practice is simply to compose a patch by copying in the newest version of whatever Gnulib source files have compilation issues.
6.4. Branch-update patches
The most common — and least objectionable — type of patch used in CBL is the "branch-update" patch. All large and complicated software packages, like GCC and the GNU C library, have bugs. Some of those bugs are, inevitably, severe. A common practice for project teams is to maintain bugfix branches in their source repositories for at least the most recent few releases of their package(s); as bugs are found and fixed, these changes are back-ported from the current main line of development to these bugfix branches.
It’s my practice to pull in all the updates from these bugfix branches from time to time, and apply those as patches so that the CBL system is as stable and bug-free as possible.
Any time you see a package with a patch called branch-update-
along
with a year-month-day datestamp in its name, that’s what it is: a
compilation of all changes from the upstream project’s bugfix branch, as
of whatever date is indicated by the patch file name.
Unlike other CBL patches, these branch-update patches are not tracked in
the cbl-patches
git repository, although they are present in the CBL
file repository. The rationale for this is that it’s trivial to
reproduce these patches from the upstream project’s version control
repository; all you have to do is obtain that repository and then use a
command like git diff glibc-2.28..remotes/origin/glibc-2.28/master
to
produce a current branch-update patch for glibc 2.28.
The Host-Side Build
7. Preparing For The Build
Before we can start the CBL build process, there are a couple of things we need to make sure of.
7.1. Required Host-System Packages
First, and most importantly, it’s important to make sure you have all the tools you need on the host system. Many GNU/Linux systems are missing one or more of the programs necessary for CBL, or provide a version of those programs that won’t work properly for one reason or another. So, although it’s not an intrinsic part of the CBL process per se, CBL includes the Trustworthy Host-System Programs appendix for building a trustworthy set of the programs that we’ve found to cause problems later on.
The basic set of requirements from the host system include modern
versions of the GNU toolchain (GCC, binutils) and build system (make
,
the autotools, and so on), the QEMU emulator, and the lzip
compression
program. The file
program is also necessary and must be the same
version that is set up in the CBL process. If you’re unsure of whether
you have everything you need, please do check the Trustworthy Host-System Programs
appendix and make sure you have the things built there.
If you’re following the CBL process on a LB Linux system, this is
generally unnecessary — all of the packages built in the
host-prerequisites appendix are also built in the CBL process and are
part of the basic LB Linux system. The only gotcha in that case is the
file package, which can only be cross-compiled on a system that has
the same version of file
installed already. So if you’re using an LB
Linux system but the version of file
installed there is older than the
one that the CBL process currently uses, you’ll need to upgrade that
package before you can proceed.
7.2. Final Preparations For The Build
The whole first part of the CBL process — the part that runs on the
host system — will need to find the trusted host system programs (if
they were built) and the cross-toolchain programs, so we should make
sure they are on the PATH
. We also need to ensure that shared
libraries installed as part of those packages can be found by those
programs — if you’re not clear on what that last part means and don’t
feel like being patient, you can skip ahead to the A Word About The Dynamic Linker
section where shared libraries are discussed.
- Environment variable: PATH
-
/home/lbl/work/crosstools/bin:/usr/bin:$PATH
- Environment variable: LD_LIBRARY_PATH
-
/usr/lib:$LD_LIBRARY_PATH
Litbuild provides a feature that allows the scripts it generates to be
re-run if the build crashes partway through, by making note of which
scripts have completed successfully and skipping them in future runs. To
activate this feature, we define an environment variable
LITBUILDDBDIR
.
- Environment variable: LITBUILDDBDIR
-
/home/lbl/work/crosstools/litbuilddb
Now, at last, we’re ready to start the CBL process per se.
8. Constructing a GNU Cross-Toolchain
This section describes how to build a GNU cross-toolchain that runs on computers with one CPU architecture — for example, x86_64 — and constructs programs that will run on a different CPU architecture, like MIPS or ARM. The result is composed of current stable versions of all toolchain components as of July 2019. That means:
-
binutils 2.37
-
gcc 11.2.0
-
glibc 2.34
-
gmp 6.2.1
-
isl 0.24
-
linux 5.13.11
-
mpc 1.2.1
-
mpfr 4.1.0
8.1. Toolchain Basics
A "toolchain" is the set of all of the programs and libraries required to transform source code into executable programs that will actually work. It’s called a "chain" because there are multiple programs involved: each program takes some kind of input file and produces some kind of output file, which then becomes the input for the next program in the chain. Each program is like a link in the chain. (When you think about it that way, it’s not really the best metaphor. It’s really a lot more like an assembly line! But "toolassemblyline" sounds awful.)
In this case, we are building a C and C++ toolchain: it will be able to construct programs from C and C++ source code.
The toolchain being built here consists of: a preprocessor, which handles include directives and macro calls and things like that; a compiler, which takes C or C++ source code and translates it into assembly language code; an assembler, which takes that assembly code and translates it into binary object code; and a linker, which combines object code produced by the assembler with additional object code contained in libraries and produces executable programs. The process also requires an implementation of the C and C++ standard libraries, which contain a large collection of functions that the linker uses when producing programs. In many cases, the implementation of those functions involves making system calls, which are basically functions provided by the operating system kernel; because the standard libraries make use of system calls, the process of building a toolchain also requires the kernel header files that specify what system calls are available and how they work.
In the GNU toolchain, the preprocessor is cpp
and the compilers are
gcc
(for C source code) and g++
(for C++ source code),
all of which are contained in the GNU Compiler Collection (gcc)
distribution, along with the standard C++ library. The assembler
and linker are as
and ld
, and are contained (along with other
programs that operate on object code files) in the GNU binutils ("binary
utilities") distribution. The C standard library is distributed
separately as the glibc (GNU libc) package, and the kernel header files
are part of the Linux kernel distribution.
Since glibc is a rather large library — over a hundred megabytes of source code, producing shared library files that are in the megabytes — it is not very well-suited to building programs that must fit in a compact space such as the flash chip that holds the firmware in wireless routers. For those programs, an alternative C library, such as musl or uClibc, is more appropriate. Toolchains using those C libraries can be produced using a variant of these instructions.
8.2. Cross-Toolchains
A "cross-toolchain" is much like a normal, or "native," toolchain. The difference is that a cross-toolchain runs on a computer of one type (the "host" system) but builds programs that will run on a different type (the "target" system). For the CBL project, for example, we use common Intel Core or Pentium computers to build software that will run on an ARM-architecture CPU. That way we can be reasonably certain that the final system really is entirely built from source, and no binary code was simply copied from the original build system to the target system: code from the build system simply can’t run on the target system, so if any binary code from the host system winds up on the target system, it won’t work at all.
To build a GNU cross-toolchain, we pass --host
, --target
, and
--build
options to the configure
scripts (produced using the
autotools programs of the GNU build system) of the toolchain components.
The values assigned to these options are called "target triplets," a
term that also comes from the GNU build system (see section 14 of the
Autoconf manual if you’d like to learn more). A triplet is a string with
multiple components that are separated by hyphens. Historically, the
triplet has had fields for CPU, manufacturer, and operating system
(e.g., mipsel-pc-gnu
for a little-endian MIPS CPU, PC hardware, and
the GNU operating system); more commonly, these days, the OS field is
subdivided into two fields, "kernel" and "system," so for all intents
and purposes the target triplet winds up having four components:
cpu-manufacturer-kernel-os
(e.g., mipsel-pc-linux-gnu
). That means
that the term "target triplet" is kind of obsolete and misleading, and
it would make more sense to refer to them as "target quadruplets." But
sometimes history wins over accuracy and clarity.
The manufacturer field is basically freeform; in many cases it’s just
set as pc
for IBM PC-architecture systems, or left as unknown
or
none
. In CBL we set the manufacturer to cbl,
because it’s shorter
than unknown
and is distinctive: any time you see a triplet like
arm-cbl-linux-gnu
, you can be fairly confident it was produced using
the CBL procedure.
The manufacturer field can be omitted entirely, but that makes the whole
situation much more complex and ambiguous: for example, in the triplet
arm-linux-gnu
, is linux
the manufacturer or is it part of the os
field? The GNU build system includes a script called config.sub
specifically to take triplet strings and figure out what they mean. My
advice is: always specify triplets as cpu-manufacturer-kernel-os.
The other components of the triplet — cpu, kernel, and OS — sometimes trigger specific behavior, especially during GCC builds. It’s a good idea to review the "Host/Target specific installation notes for GCC" section of the GCC build instructions when choosing a triplet for your system.
The GNU build system includes a script, config.guess
, that tries to
figure out what the host triplet is. Generally, this should be used as
the HOST
configuration parameter in CBL. You can always find an
up-to-date copy of it, and the related script config.sub
mentioned
above, in the GCC source code distribution.
Different combinations of those configure directives, host
, target
,
and build
, are used for different toolchain components and at
different points in the CBL process. What they mean is:
build
-
the system where the toolchain components are built
host
-
the system where the toolchain components will run
target
-
the system where the resulting artifacts will run
When build
, host
, and target
are all different, it’s called a
"Canadian Cross." We’re not sure why. In CBL, we don’t build any
Canadian Crosses. During the CBL process, we:
-
build a native compiler unless we already have one that we can definitely trust; for this one, of course, we don’t need to specify
host
,build
, ortarget
; -
use the trusted native compiler to build a cross-toolchain (at this point,
build
andhost
are both that of the initial system, andtarget
is the target system type); -
use that cross-toolchain to build a target-native toolchain as well as a collection of programs that will be needed on the target system (at this point,
build
is the initial system, andhost
andtarget
are both the target system type); and finally -
boot into the minimal target system userspace we constructed in the previous step, and use the target-native toolchain built there to construct a new, testable, native toolchain (for which we again don’t need to specify
build
,host
, ortarget
).
Once the cross-toolchain is built, most packages built using it will be
configured with build
set to the host computer and host
set to the
target computer.
8.3. The CBL Sysroot Toolchain
The cross-toolchain built here uses the sysroot framework. The idea
behind a sysroot toolchain is simple: a directory on the host system
(the "sysroot" directory) is set up to contain a subset of what will
eventually become the root filesystem of the target system: /bin
,
/lib
, /usr/lib
, /etc
, that sort of thing. Header files and
libraries will be used only from the sysroot location, not from the
normal host system locations.
This is fine for most cross-compiling purposes, but it’s not quite perfect for our purposes in CBL.
Remember that the whole point of this cross-toolchain is to build a bootable GNU/Linux system — the kernel, and a minimal set of userspace programs that we’ll then use to construct the final system. Those components, the "scaffolding," won’t be used any longer than is necessary. This is because we don’t want to rely on the programs produced by the cross-toolchain. By preference, for any program that is distributed with a test suite, we want to run the test suite before we assume it works properly! When you’re cross-compiling programs, you can’t easily run the tests.
So once we have the target system booted, we only use the scaffolding to
build the first parts of the final system. As we build those, we
install them into the canonical filesystem locations where they belong:
the standard directories /bin
, /lib
, /usr
, and so on. To make
that process as straightforward as possible, and avoid any interference
between the scaffolding and the final system components, we want all the
scaffolding to wind up in a /scaffolding
subdirectory of the root
filesystem. If all of the ephemeral stuff is self-contained within
/scaffolding
, then obviously it won’t conflict with the final system
programs as we build and install them.
Setting up everything so that it’s self-contained within a non-standard
location also makes it easier to ensure that our final system doesn’t
still rely on any of the components there: once the full system build is
complete, we can just delete the /scaffolding
directory and we’ll be
left with just the Little Blue Linux system.
When building a toolchain, it’s important to simplify the execution environment as much as possible; any unncessary compiler or linker flags can cause things to break. Don’t worry about optimizing anything for this stage, either: remember, everything we’re doing at this point is throw-away work.
- Environment variable: PATH
-
/home/lbl/work/crosstools/bin:$PATH
- Environment variable: LC_ALL
-
POSIX
- Environment variable: CFLAGS
-
(should not be set)
- Environment variable: CXXFLAGS
-
(should not be set)
- Environment variable: LDFLAGS
-
(should not be set)
- Environment variable: LD_LIBRARY_PATH
-
/usr/lib
This is as good a point as any to discuss the dynamic linker and the way it works.
8.4. A Word About The Dynamic Linker
In most cases, executable programs on Linux systems are linked against shared libraries rather than static libraries. That means that programs don’t contain a copy of the binary code for library functions they invoke; instead, programs contain references to those library funtions, and have a list of the shared libraries that are expected to contain the implementation for those functions.
Whenever a program is executed, the references to library functions
obviously must be resolved — that is, the shared libraries are searched
for all the functions needed by the program (and, since those functions
might call other library functions as well, libraries are searched for
those functions as well, and so on recursively), and linked together
so that all of the necessary functions are available. This resolution is
done by the dynamic linker, also known as the dynamic loader and,
sometimes, as the program interpreter. This is a program included with
the GNU C library — or whatever other C library is being used, like
musl — and is conventionally installed at /lib/ld.so
or
/lib/ld-linux.so
.
The way that the dynamic linker finds shared library files is a bit complicated, and that complexity is at the root of a lot of the issues that can come up during the CBL build process, so let’s talk about it a bit!
8.4.1. The tl;dr
Here’s a summary of the basics, for anyone who doesn’t need all the grueling details:
-
There are a bunch of standard system directories that are always used when looking for shared library files — directories like
/lib
and/usr/lib
. If you add a directory to/etc/ld.so.conf
, it basically becomes one of those system directories. -
If you have library files in a different directory, you can get ldconfig to look there by setting an environment variable,
LD_LIBRARY_PATH
. -
If, when you’re building a program, you know that the program will need some shared libraries and those libraries will not be in one of the standard system locations, you can make those directories a part of the library search path for that program by giving
ld
the argument-rpath /whatever/dir:/another/dir
. This sets anRPATH
for the program, which overridesLD_LIBRARY_PATH
: theRPATH
will be used before the directories inLD_LIBRARY_PATH
are checked. -
If you want to do something like
RPATH
, but you want to make it easy for people who run the program to override the library search path, you can add theld
option--enable-new-dtags
(in addition to the-rpath
option mentioend above). This will causeld
to set aRUNPATH
rather than anRPATH
;RUNPATH
is checked for shared library files afterLD_LIBRARY_PATH
is checked.
This all matters for CBL because the build process for some of the
necessary packages will result in programs that can’t find their shared
libraries unless we use an RPATH
or RUNPATH
; and the build process
for other packages sets an incorrect RPATH
or RUNPATH
that can cause
problems unless we remove it.
Usually, ld
isn’t executed directly by build processes, but is instead
invoked by the gcc
driver program. To get gcc
to pass an option
along to ld
, you can give it the option -Wl
, which should be
followed by additional words separated by commas; gcc
will replace the
commas with spaces and pass the resulting arguments on to ld
. So to
set an RPATH
of /some/dir
and /another/dir
, you can give gcc
the
argument -Wl,-rpath,/some/dir:/another/dir
.
An alternative that might work is to set the environment variable
LD_RUN_PATH
to the desired RPATH
before linking — the ld
documentation suggests this will work, but I haven’t tried it.
8.4.2. The grueling details
This might make your eyes glaze over a bit.
A lot of this discussion pertains only to programs in the Executable and Linkable Format, commonly abbreviated as ELF. This is the only program format used in modern GNU/Linux systems, but you should be aware that there are other executable program formats, like a.out and COFF, and a lot of the details here don’t apply to those formats.
ELF programs consist of an ELF header followed by some number of
segments of various types. You can see the full structure of ELF
programs by using the program readelf
, which is part of the GNU
binutils package.
When a program is executed, the Linux kernel looks in its .interp
section to find the path for its interpreter — in our case, this will
be the full path of ld.so
. That interpreter then looks in dynamic
segments for the names of shared library files that are needed by the
program and tries to find them. If any shared library can’t be found,
the dynamic linker will terminate with an error.
There’s a bit of additional complexity: shared library files are also in ELF format and can also declare that they are dependent on other shared libraries. This can result in a chain of library dependencies, potentially a lengthy one! When the dynamic linker is trying to find a library, its behavior is partly determined by which object is having its references resolved: the original program, or one of the libraries it depends on, or one of their libraries, and so on. Whichever ELF file is having its references resolved is called the "loading object" here.
The rest of this section describes the procedure used by the dynamic linker to locate shared library files.
Shared library names can contain slash characters, although they
usually do not. (I don’t even know how to get ld
to create a program
with library dependencies that specify a full path.) If a shared library
name does contain any slash characters, it is treated as a relative or
absolute path and the dynamic linker will only look for the library at
that path.
In the common case, when a shared library is specified without any slash
characters, the dynamic linker looks for it in a variety of locations.
The algorithm it uses is poorly documented and there’s a lot of
contradictory information about it on the web. After looking around for
quite a while, I found a helpful blog post on qt.io
that asserted that
the relevant code is _dl_map_object
in the glibc file elf/dl-load.c
,
and a review of that code revealed the following:
-
If the loading object has a
RUNPATH
, skip ahead to theLD_LIBRARY_PATH
step.-
Look in the
RPATH
of the loading object, if any. -
Consider the thing that loaded the loading object. If it has a
RUNPATH
, skip ahead to theLD_LIBRARY_PATH
step. If not, look in itsRPATH
. -
Continue doing this recursively up the loading chain (skipping ahead if you find a
RUNPATH
, looking for libraries inRPATH
) until you reach the end of the loading chain. This will normally be the program being executed, but could also be a shared library loaded using thedlopen
function.
-
-
Look in the
LD_LIBRARY_PATH
environment variable. -
Look in the
RUNPATH
of the loading object, and once again look up through the loading chain until you reach the end. -
Look in the locations found in
/etc/ld.so.cache
, which is generated from/etc/ld.so.conf
using theldconfig
program. -
Finally, look in the default directories defined when glibc was compiled; when using the standard build process, this means
/lib
and/usr/lib
.
As soon as the shared library is found in any of those locations, that’s
the version that will be used. If it’s not found in any of those
locations, the dynamic linker will give up and crash with an error
message. And, in case this was not clear, in all the things that look
like PATH
— LD_LIBRARY_PATH
, RPATH
, RUNPATH
— you can specify
any number of directories separated by colons, just like the PATH
environment variable.
A historical note: RPATH
has been around longer than RUNPATH
.
RUNPATH
was implemented, along with all the logic about skipping
RPATH
when a RUNPATH
is present, because people realized that
when someone specifies an LD_LIBRARY_PATH
it’s usually because they
really do know what they want the dynamic linker to do, and it’s rude to
override that desire at program compilation time.
When you give ld
the -rpath
option by itself, it just creates an
RPATH
in the resulting program or library. When you add the option
--enable-new-dtags
, it still creates an RPATH
(in case you run the
program with an old dynamic linker that doesn’t understand RUNPATH
),
but it also creates a RUNPATH
so that modern dynamic linkers will
ignore the RPATH
.
As a reward for reading this far, here’s one last option you can use if
none of the above suits your purpose: before it does anything else, the
dynamic linker loads any .o
object files, or .so
or .a
library
files, that are named in the environment variable LD_PRELOAD
or in the
/etc/ld.so.preload
file. Any functions defined in those files are used
in preference to any other function definitions found later in the
process. That lets you override individual function definitions, if you
want to override some part of the program without replacing an entire
library with a different version.
You can read more about all this stuff in the ld
and ld.so
info and
man pages, and in the "Program Library HOWTO" in the Linux Documentation
Project.
8.5. binutils
Name |
GNU binary utilities |
---|---|
Version |
2.37 |
Project URL |
|
SCM URL |
git://sourceware.org/git/binutils-gdb.git |
Download URL |
|
Patches |
|
8.5.1. Overview
The GNU binary utilities package contains a plethora of programs and
libraries that can be used to produce, manipulate, and otherwise operate
on (compiled and assembled) object files. I’ll briefly describe them all
here, but don’t worry! Not only will there not be a test on any of this,
you won’t usually need to invoke any of these programs manually. The
gcc
driver program will invoke as
and ld
as necessary to do its
work, and several of the other utility programs are similary used
internally during the build process of other system components but you
won’t need to use them yourself.
The most important binutils programs are as
, the assembler, which
transforms assembly source code (.s
files) into binary or "object"
code (.o
files); and ld
, the link editor (ld
is usually just
called the linker, but it’s hard to see why it’s called ld
without
knowing the other term), which combines multiple object files into an
executable program.
There are actually two different ld
programs in current GNU binutils:
the original version uses the binary file descriptor (BFD) library and
can always be found at ld.bfd
. There’s also a newer program, "gold,"
that doesn’t use BFD and only works for binaries that are in the
"Executable and Linkable Format," aka "ELF"; it’s always available at
ld.gold
. One or the other program is linked so it can be executed as
ld
.
The other programs are:
-
addr2line
translates program address locations to filenames and line numbers, which can be helpful during debugging. -
ar
can be used to create, modify, and extract files from archives. -
c++filt
converts the mangled function names found in compiled C++ programs back to the original un-mangled names. -
gprof
lets you run programs with instrumentation that tells you how much time is spent in different parts of the code, which can be helpful when optimizing programs. -
nm
lists symbols found in object files. -
objcopy
can translate object files to various alternative formats. -
objdump
is a disassembler; it can convert binary files into a canonical assembly language. -
ranlib
generates indexes for archive files. -
readelf
shows information about ELF-format object files. -
size
displays sections of an object or archive, along with their sizes. -
strings
prints out printable character sequences found in binary files. -
strip
discards symbols or other unnecessary data from object, library, or program files, which reduces their size but makes them much harder to debug.
There are a couple of shared libraries used by those programs and
available to others, as well: libbfd
and libopcodes
. All of the
utility programs are documented much mor thoroughly in man pages and the
binutils info file.
As I mentioned above, you don’t need to run any of those programs
directly because the gcc
driver program streamlines the simple case of
building executables — to compile a "Hello, World" program, you just
need to run gcc hello.c
, and let the gcc driver program run as
to
assemble compiled source into object files, ld
to link multiple object
files together to produce an executable, and maybe other programs if it
needs to for some reason.
The downside of using a driver program is that it can make complex builds (when, for example, specific options need to be passed to the assembler, compiler, and linker) a lot more complicated and fussy — as you’ll see, from time to time, during the CBL process.
When configuring binutils, we always specify --enable-64-bit-bfd
. This
is needed to enable 64-bit support, which is important when the host or
target systems have a 64-bit userspace. It’s unimportant for entirely
32-bit builds (for example, an i686-to-mipsel CBL build), but doesn’t
cause any problems when it’s used in one of those builds.
-
binutils-2.37-fix-gcc7-warning-messages-1.patch
When cross-compiling, a bug introduced in GCC 7 (which was bug 81840 in
its bugzilla bug-tracking system, but then bugzilla crashed and that bug
report was lost) causes an incorrect warning when compiling tc-i386.c
in the gas
source tree in at least some cross-compilation scenarios.
That’s not a big deal, except that the binutils build treats all
warnings as errors and terminates the build when it sees them. This
patch works around the GCC bug so it doesn’t produce the problematic
warnings.
8.5.2. binutils (gnu-cross-toolchain phase)
Build Directory |
|
---|
Binutils, like some other parts of the toolchain, should be built in a separate directory from the source. As with other components that we build several times, CBL puts each distinct build in a separate location to avoid any need to clean things up to a pristine state for the next build.
- Build Directory
-
../build-binutils-2
We build a cross-binutils using the "sysroot" framework (which you’ll
read more about shortly). That framework isn’t particularly
well-documented, but the important thing at this point is that we need
to specify the configure options --with-sysroot
and
--with-build-sysroot
to inform the build machinery of the desired
sysroot directory.
All of the toolchain components should be installed in the same
filesystem location. The CROSSTOOLS
parameter lets you specify that
location.
${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/crosstools \
--build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=aarch64-cbl-linux-gnu \
--with-sysroot=/home/lbl/work/sysroot --with-build-sysroot=/home/lbl/work/sysroot \
--disable-nls --enable-shared --disable-multilib \
--with-lib-path=/home/lbl/work/sysroot/scaffolding/lib \
--enable-64-bit-bfd
make configure-host
Some of the warning messages present in GCC 8 and later present problems when compiling the binutils. We can tweak the generated Makefiles so those warnings won’t be converted to errors.
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=stringop-truncation@' bfd/Makefile
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=stringop-truncation@' gas/Makefile
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=format-overflow@' binutils/Makefile
make
(none)
For some reason, the binutils installation process doesn’t copy the libiberty header file. libiberty is a support library used by several components in the GNU toolchain, distributed both as part of binutils and gcc; later builds will want to use functions defined in it. So we install that ourselves.
make install
mkdir -p /home/lbl/work/sysroot/scaffolding/include
cp -v ${LB_SOURCE_DIR}/include/libiberty.h \
/home/lbl/work/sysroot/scaffolding/include
8.6. gmp
Name |
GNU Multiple Precision arithmetic library |
---|---|
Version |
6.2.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
8.6.1. Overview
GMP, the GNU Multi-Precision arithmetic library, is a component in — or perhaps it is more properly called a dependency of — the GNU toolchain. It allows arithmetic operations to be performed with levels of precision other than the standard integer and floating-point types. Applications can use GMP to provide arithmetic with thousands or millions of digits of precision if that’s what they need. GMP also provides support for rational-number arithmetic, as well as integer and floating-point.
If you’re really interested in high-precision floating-point arithmetic, you might want to look into MPFR rather than GMP! The GMP people say it’s much more complete.
GMP has been needed by the Fortran GCC front-end for some time, but starting with release 4.3.0 of GCC it is needed for C (and C++) as well.
MPC, another dependency of GCC, requires a GMP built with C++ support, so we need to specify that at configure time.
This package is often built in-tree as part of GCC, rather than separately — that’s especially true when the only reason you’re buiilding GMP is because GCC requires it. When using an in-tree build, this blueprint is pretty much irrelevant. However, as of 2015-09-27, there’s an issue with in-tree builds of GMP in some cross-toolchain builds, so for CBL we build it separately.
8.6.2. gmp (gnu-cross-toolchain phase)
This is not necessary, because the system or host-prerequisites version of GMP can be used just fine by the cross-compiler. If things break down here and you haven’t built the Trustworthy Host-System Programs, you should probably do that.
(none)
(none)
(none)
(none)
8.7. mpfr
Name |
MPFR library |
---|---|
Version |
4.1.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
Dependencies |
8.7.1. Overview
MPFR is a library for arbitrary-precision floating-point arithmetic. It stands for "Multiple-Precision Floating-point Rounding," I think, but that’s not really clear from their site so I might be wrong. It uses GMP internally, but provides any level of precision (including very small precision) and provides the four rounding modes from the IEEE 754-1985 standard.
Like GMP, MPFR is a component in, or dependency of, the GNU toolchain. It has been needed by the Fortran GCC front-end for some time, but starting with release 4.3.0 of GCC, MPFR is needed for C and C++ as well. GCC uses MPFR to pre-calculate the result of some mathematic functions when those functions have constant arguments, and produces the same results regardless of the math library or floating point engine used on the runtime system. This occurs in what is called the GCC "middle-end," which is kind of a silly name since it’s not an end.
This package is often built in-tree as part of GCC, rather than separately. However, as of September 2015, in-tree builds of the dependencies have some issues in certain circumstances, so in CBL we step away from the in-tree build facility altogether.
8.7.2. mpfr (gnu-cross-toolchain phase)
As with GMP, this is not necessary because the system or host-prerequisites version of MPFR can be used.
(none)
(none)
(none)
(none)
8.8. mpc
Name |
GNU Multiple Precision Complex library |
---|---|
Version |
1.2.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
8.8.1. Overview
MPC is a C library for arbitrary-precision arithmetic on complex numbers providing correct rounding. It can be thought of an extension to the MPFR library.
Like GMP and MPFR, MPC is a component in, or dependency of, the GNU toolchain. I haven’t been able to find any description of what features of MPC are actually needed by the GNU toolchain, but MPC is a hard build-time dependency of GCC.
If your system already has MPC installed, this step can be skipped.
Like the other GCC library dependencies, this package is often built in-tree as part of GCC. As mentioned earlier, though, this sometimes introduces problems, so CBL doesn’t take advantage of the in-tree build machinery provided by GCC.
8.8.2. mpc (gnu-cross-toolchain phase)
As with GMP, this is not necessary if there is a system or host-prerequisites version of MPC available.
(none)
(none)
(none)
(none)
8.9. isl
Name |
Integer Set Library |
---|---|
Version |
0.24 |
Project URL |
|
SCM URL |
git://repo.or.cz/isl.git |
Download URL |
|
Dependencies |
8.9.1. Overview
The project homepage describes ISL as a library for manipulating sets and relations of integer points bounded by linear constraints. I’m not sure what that means. It sounds like math.
Unless you have some specific reason to want ISL for one of your own projects, the main benefit it provides is as an optional dependency of GCC: the "graphite loop optimizations" in GCC (whatever those are) require ISL to be available.
8.9.2. isl (gnu-cross-toolchain phase)
As with GMP, this is not necessary if there is a system or host-prerequisites version of ISL available.
(none)
(none)
(none)
(none)
8.10. linux
Name |
Linux kernel |
---|---|
Version |
5.13.11 |
Project URL |
|
SCM URL |
|
Download URL |
|
Patches |
|
8.10.1. Overview
The kernel is Linux per se — the foundation of the operating system. Often, when people say "Linux," they mean the entire operating system that lets them use a computer; properly, though, Linux is just the kernel. Many — perhaps most — of the other components that make up the full operating system are pieces of the Free Software Foundation’s GNU project, which stands for "GNU’s Not UNIX"; almost all of the program-construction tools that form the foundation of the system are GNU components.
The job of the kernel — this is a critically important point and I say it a lot — is to initialize and manage all of the hardware on the computer (including CPUs, memory, and I/O devices) and provide services to userspace processes. That’s a big job, but it’s also a limited one! Everything you do with your computer is the responsibility of userspace processes.
The way that Linux performs its job — more generally, the way that UNIX
kernels work — is conceptually very simple: when it is executed on a
computer, Linux does some hardware initialization, mounts the root
filesystem, and then starts a single userspace process. That process has
process ID (PID) 1, and is conventionally called init
.
The init
process is responsible for starting all other userspace
programs and getting the machine into a usable state; after the kernel
starts init
, it gets out of the way and waits for that process, or for
other userspace programs, to request its services.
The complexity in the Linux kernel emerges mostly from the huge variety of hardware that it supports and the many layers of functionality that can be built into it. If you unpack the Linux source code, you’ll find that the whole thing adds up to about (as of the 5.1 kernel) 926 megabytes in total. Of that, about thirteen percent (127 megabytes) is specific to the twenty-six different CPU architectures that Linux supports, and well over half (547 megabytes) is device drivers. That means that for any given system, a large majority of the code that makes up the Linux kernel won’t ever be used.
In CBL, the kernel is kept as pristine as possible. The only patches applied here are intended to add new default configuration settings — which are more convenient than setting dozens of configuration options manually — and in some cases to add support for hardware devices and platforms that are not currently supported by the mainline kernel.
-
linux-5.13.11-aws-ami-config-1.patch
8.10.2. linux (gnu-cross-toolchain phase)
The way that programs make requests of the kernel is by invoking kernel functions known as "system calls." The Linux kernel sources include header files that define all of the system calls it makes available to userspace programs.
Userspace programs don’t usually invoke system calls themselves (although they can, and some do). Instead, they invoke library functions — particularly the ones defined in the C standard library, which on GNU/Linux systems is usually the GNU libc, glibc, but might be musl or uClibc or something else altogether. Those library functions invoke system calls as needed to accomplish their work.
In this step, we’re not actually building the Linux kernel; all we’re doing is installing the header files that specify those system calls. These header files are then used by the C library and other userspace libraries and programs to invoke the system calls as needed to perform their work.
The Linux source tree also includes a number of other, private, header
files — these define data structures and functions that should be used
only within the kernel itself, and are not intended to be visible to
userspace programs. The Makefile target we use here, headers_install
,
installs only the public header files, not these private internal
headers.
The makefile target mrproper
puts the source tree into a completely
pristine state. The name is a reference to the Proctor & Gamble cleaning
product that people in the United States know as "Mr. Clean"; in many
parts of Europe, it is marketed under the brand name "Mr. Proper."
make mrproper
The makefile target headers_check
presumably makes sure that the
public header files are OK. I haven’t really looked at it, though. If
you know what it really does, and that’s not it, please tell us.
make ARCH=arm64 headers_check
(none)
The install_headers
target deletes the entire target directory before
it does the installation. This is inconvenient, because we might
possibly have header files there that we want to retain (such as from
the binutils build). To avoid any issues, we’re going to install the
headers to a temporary location and copy them to the real target
location from there.
Notice that we’re actually installing the headers into the scaffolding
directory underneath the sysroot directory. That’s common to the
entire host-side portion of CBL: Everything that will be conveyed to
the target system is under the scaffolding location. That way, when the
final system is completed, we can get rid of /scaffolding
and be
fairly confident that everything remaining has been built entirely from
source and (if it continues to work) has no dependencies on the
scaffolding programs and libraries.
make ARCH=arm64 INSTALL_HDR_PATH=_dest headers_install
install -dv /home/lbl/work/sysroot/scaffolding/include
cp -rv _dest/include/* /home/lbl/work/sysroot/scaffolding/include
rm -rf _dest
8.11. gcc
Name |
GNU Compiler Collection |
---|---|
Version |
11.2.0 |
Project URL |
|
SCM URL |
git://gcc.gnu.org/git/gcc.git |
Download URL |
|
Patches |
|
Dependencies |
gmp (gnu-cross-toolchain phase), mpfr (gnu-cross-toolchain phase), mpc (gnu-cross-toolchain phase), isl (gnu-cross-toolchain phase), binutils (gnu-cross-toolchain phase) |
8.11.1. Overview
GCC is the GNU Compiler Collection. This is the single most important package in CBL: without a compiler, you can’t build any software. We use GCC to bootstrap everything, including itself.
Unfortunately, GCC is also probably the most complex package we need to build, and it has a very complex configuration and build process because it is tied so intricately to other packages. This is particularly true of glibc, the C standard library we use in CBL, which also has a complex build process of its own!
On the plus side, if you can get GCC built and working properly, you’re past the biggest hurdle of building a complete GNU/Linux system entirely from source.
The GCC installation process is documented in the "Installation" manual,
found in the source distribution in the INSTALL
directory. If you have
problems getting GCC built, that’s an excellent resource for figuring
out what’s going wrong. If you want to get a better understanding of the
GCC configuration and build process — how it actually works — read
the "Source Tree Structure and Build System" section of the GCC
Internals document, found in the source distribution in
gcc/doc/gccint.info
.
The most important compilers in the collection, for purposes of bootstrapping a system, are the C and C++ compilers; but GCC also includes compilers for Fortran, Go, Ada, Objective C, and probably other languages as well. And it supports a huge number of machine architectures! That’s probably the most important aspect of GCC for our purposes.
8.11.2. gcc
: The driver program
The program you usually invoke to build C programs, gcc
, is not
actually a compiler. It’s just a driver that knows how to invoke other
programs, using rules called "spec strings" that tell it exactly what
other programs it needs to invoke, and what command-line arguments it
should provide, to turn source code into an executable program. (We’ll
talk more specifically about spec strings in Adjusting the GCC specs.)
I find it really important to keep that in mind throughout toolchain
construction, so I’ll expand on that point: gcc
just invokes other
programs. Some of those programs — the C preprocessor cpp
, cc1
(the
C compiler itself), and internal utility programs like collect2
(which
is sort of a first-stage linker, used to set up calls to constructors
and other initialization routines as a program starts to run) — are
part of the GCC package. Others, like the as
and ld
programs, are
distributed separately (in the case of as
and ld
, that package is
GNU binutils).
To get from source code to an executable C program, gcc
actually
performs a series of steps:
-
First, it invokes
cpp
to pre-process the source, include header files, resolve macros, and things like that;[4] -
then it invokes
cc1
to transform the pre-processed source into assembly language code; -
then it invokes
as
to transform the assembly code into object code; -
and finally it invokes
ld
to combine all the separate bits of object code, along with code from libraries, into an executable program. (At least, conceptually, that’s what it does. In reality, it invokescollect2
, which does some other stuff and then executes the actualld
program).
You can find out exactly what commands gcc
is running by giving it the
-v
(for "verbose") command line argument. That’s a handy trick when
things are going wrong and you’re not sure why!
The term "compilation" — which is the job of the compiler — actually refers only to the second of those steps: source code is compiled into assembly code. It’s not precisely correct to say that you’re "compiling" source code into an executable program! That’s a common conversational shorthand, but it masks the complete story about what’s going on. It would be more correct to say that you’re "pre-processing, compiling, assembling, and linking" source code to produce an executable program.
On the other hand, that’s a lot of words, and the complete story is seldom really something you need to keep in mind. This is probably why the shorthand term is so popular.
A few external dependencies were added to gcc in release 4.3: the GMP, MPFR, and MPC libraries are required for all compiler builds, and a few optimizations — the graphite loop optimizations — are only available if the Integer Set Library (ISL) is available. The graphite loop optimizations are not critically important, but there’s no reason not to include them if it’s not difficult to do.
Sources for the required — and optional — libraries can be included in the GCC sources in directories named gmp, isl, mpc, and mpfr; if they are, then they’ll be built automatically along with gcc. There’s actually a fairly large number of packages that the build machinery for GCC detects and incorporates into builds automatically. This is convenient, but restrictive: in-tree builds are only reliable when specific versions of the dependency libraries are present, and for CBL we prefer to use the latest stable release of everything.
There are also some cross-compilation scenarios in which the in-tree library builds actually do the wrong thing and produce a compiler that will not work right; that’s another reason that we avoid them here.
-
gcc-11.2.0-fix-relocation-headers-1.patch
There’s a problem introduced in GCC 6.1 where C++ header paths are hard-coded, which prevents them from being found by the scaffolding compiler (which lives at a different filesystem location when the target system is booted).
-
gcc-11.2.0-fix-missing-rpath-1.patch
When specifying locations for the dependency libraries (GMP, MPFR, etc),
the specified library location should be used both at build time (with
an -L
linker directive) and at run time (with an -rpath
link
directive). This is not done in all cases by the normal GCC build
process, but we can patch it easily to add that behavior.
The next few paragraphs discuss issues related to the way the dynamic linker works. If you’re not familiar with its operation, it might be a good idea to review the section A Word About The Dynamic Linker — or just skip ahead a bit, if you’re not interested in the details of a linking problem and the reason we apply this patch to work around it.
Even though the math libraries needed by GCC (GMP, ISL, MPC, and MPFR)
have just been built here and GCC is configured to find them in their
correct locations, some of the programs that make up GCC are built
without an RPATH
or RUNPATH
, so the dynamic loader will look for
those libraries at runtime in the normal host system library
directories. This was filed as GCC bug 84153, but the GCC maintainers
don’t consider it a problem.
To be fair, this really is not usually very much of a problem! The issue appears when the version of the library used in CBL is a major version later than the version installed on the host system. In that case, the GCC we’re building in this section won’t work because it won’t be able to find the version of the libraries it depends on.
This is the sort of situation that LD_LIBRARY_PATH
is intended to
resolve, but there’s a gotcha: in some of the host-scaffolding builds,
the native toolchain is used to build some programs that are run as
part of the build process itself, and the LD_LIBRARY_PATH
set in the
environment for the cross-build is unset when building those native
programs.
That means that to get the this native gcc to be reliable, we need to
set a RUNPATH
or RPATH
to tell the dynamic loader where to look for
shared libraries. The way you normally do this is to set an LDFLAGS
environment variable with a value like -Wl,-rpath,/usr/lib
when running the configure
script, so that ld
would be told to build
programs with that RPATH
, but it turns out the GCC build doesn’t use
the LDFLAGS
linker arguments when constructing some of the programs
that need the dependency libraries, like cc1
, cc1plus
, and lto1
.
We can work around that by patching the configure
script so that any
time it’s given arguments like --with-gmp
or --with-gmp-lib
, the
flags it collects will include a -Wl,-rpath
option along with the -L
option. That’s what this patch does.
-
gcc-11.2.0-fix-rusage-include-1.patch
An issue I’ve found in one scenario — building a 64-bit ARM to 64-bit
x86 compiler — is that a necessary header file doesn’t get included
because the getrusage
function is not available. A trivial patch works
around that problem.
-
gcc-11.2.0-workaround-bug-100017-1.patch
An issue introduced in GCC 11.1, and reported both as bug 80196 and 100017, is that a file in the C++ standard library (which is distributed as part of GCC, rather than separately) is not able to find included headers when built as part of a canadian cross compiler or a target-native compiler (like the host-scaffolding GCC). A workaround for this is provided on bug 100017, and incorporated here as a patch.
8.11.3. gcc (gnu-cross-toolchain-minimal phase)
Environment |
|
---|---|
Build Directory |
|
Bootstrapping GCC as part of a cross-toolchain is tricky. You can build the compiler per se without a working C library (libc), but that compiler won’t be able to produce executable programs: GCC can only create programs if it has access to a set of C runtime object files that it expects to be provided by the C library — and, typically, it also needs a C standard library implementation to compile programs because essentially all C programs invoke functions also provided by that library.
Obviously, we can’t compile the C library without a compiler! So there’s a chicken-and-egg bootstrapping problem.
To work around that cycle of dependencies, we’re going to build just the
parts of the compiler we really need at this point: a plain
cross-compiler, and a minimal version of the libgcc
support library
that the compiler needs in order to function. (GCC needs libgcc
because sometimes, while processing C code, the compiler generates
references to functions defined in libgcc
rather than generating
assembler code.)
GCC can most easily be built as part of a cross-toolchain by using the
"sysroot" framework — and, in fact, the GCC developers don’t support
any other method for creating cross-toolchains. To perform a sysroot
build, the configure options --with-sysroot
and --with-build-sysroot
must be specified; and when building GCC, the environment variables
LDFLAGS_FOR_TARGET and CPPFLAGS_FOR_TARGET should be set to
--sysroot=/home/lbl/work/sysroot
.
- Environment variable: CPPFLAGS_FOR_TARGET
-
--sysroot=/home/lbl/work/sysroot
- Environment variable: LDFLAGS_FOR_TARGET
-
--sysroot=/home/lbl/work/sysroot
…At least, that’s what Carlos O’Donell said in a comment on GCC bug #35532. The documentation on sysroot builds is not particularly easy to find — or at least, it wasn’t when this was written. (If you know where sysroot builds are documented, please tell me!)
The sysroot concept is pretty clever, in fact. The basic idea is, when
you build a cross-toolchain, you give it a local directory path — a
directory that exists on the build system — and specify that that
local directory will eventually become the root filesystem directory on
the target system. The cross-toolchain knows about the sysroot
location and knows to look there for system header files and the C
library and so on. CBL uses the standard sysroot approach, in a
slightly-nonstandard way: everything is installed not into the sysroot
directory per se, but into a subdirectory of the sysroot called
/scaffolding
. That way, when we finally get booted into the target
system, the root filesystem will be empty, except for the /scaffolding
directory: everything we create after that and install outside of
/scaffolding
will become part of the final system.
Depending on the target, GCC has a variety of options that control how
it operates. Generally, these can all be specified with command-line
arguments beginning with -m
. For many of these options, default values
can also be specified when GCC is being configured.
For example, for many targets, GCC can build programs with a variety of application binary interfaces (ABIs) — this is the machine-code-level interface between programs, libraries, and the operating system; it defines things like the way that registers are used when invoking functions.
An example of a CPU that supports multiple ABIs is the 64-bit x86
architecture, which is called x86_64
or amd64
. These processors can
run programs in 64 bit mode (with 64-bit pointers and all AMD64
processor features enabled), 32 bit mode (32-bit pointers and only i686
processor features enabled — in this mode, fewer CPU registers are
availble to programs, for example), or a hybrid "x32" mode (32-bit
pointers but all AMD64 processor features enabled). You can specify the
ABI to which GCC will compile by using an -m32
, -m64
, or -mx32
command-line argument.
You can also override the default ABI when configuring GCC by specifying
a --with-abi
configure directive; or, for some target architectures,
other options, like --with-multilib-list
or --enable-targets
or
probably a combination of those things. Confusingly, even though the way
you set an x86 GCC to generate code for the 64-bit ABI is to use the
configuration directive --with-abi
, the runtime command line option
-mabi
doesn’t override that selection; it only selects between the
sysv
calling convention used by UNIX-ish systems and the ms
convention used by Microsoft Windows.
This confusion is characteristic of the GCC configuration and usage options: a lot of the options that are available, and their meaning, depends on what architecture is targeted by the compiler, and the same options or terms can mean very different things for different targets.
There are a few ways to figure out what options are available for a
specific target architecture: the installation manual lists some of
them, mostly in the "configuration" section. The GCC manual has
information in section 3.18 ("Machine-Dependent Options" — if you’ve
built a set of trusted host tools specifically for the CBL build, the
correct info file can be found at
/usr/share/info/gcc.info
). You can also look at the
configuration script, gcc/config.gcc
, to see what options are accepted
for each type of target. And, finally, after building GCC for the target
architecture, you can run gcc --help=target
to see what options are
available and what values they can have; and you can also find the
actual compiler, cc1
(for C) or cc1plus
(for C++) under the
libexec/gcc/aarch64-cbl-linux-gnu/$VERSION
directory, and run it with the
--help
command-line argument to see what options are enabled and
disabled.
In addition to the selection of ABI, GCC can be instructed to optimize
for specific CPUs or CPU families for many target types. The
command-line arguments that control this behavior are similarly
target-specific — look for -mtune
, -march
, -mcpu
, and things like
that.
In CBL, any set of these configure directives can be specified for the
cross-toolchain in the TARGET_GCC_CONFIG
parameter. It’s generally a
good idea to specify default values for all of the options supported by
the target platform. (If you don’t want to set any options at all, you
can set this to an empty string.)
GCC is generally built with a large number of libraries included. Some
of those fail in some circumstances — for example, x86 CPUs can’t build
the libquadmath library when the C library being used is uClibc (or, at
least, that was the case some time ago, the last time I tried); and the
libsanitizer library fails to build when compiling for 64-bit Sparc
machines. The easiest way to work around those problems at this stage is
simply to disable those libraries, and that seems like a fine approach
considering that the cross-toolchain we’re building here is just going
to be used to build the ephemeral scaffolding programs. So if any
libraries cause the build to fail, try adding an appropriate --disable
directive to TARGET_GCC_CONFIG
.
- Build Directory
-
../build-gcc-2
You might notice that, in this build, we’re using the same host-system builds of the various arithmetic dependency libraries as we used for the host-prerequisite GCC (or the same ones that are used for the native system GCC, if you’ve skipped the host-prerequisites). It’s totally unnecessary to build them again for this GCC — the cross-compiler we’re building here will only run on the host system, so the target architecture is irrelevant to it.
To build the minimal libgcc, we specify the configuration options
--without-headers
and --with-newlib
. This is a bit sloppy — the
first of those options is the only one that should be necessary — but
the last time we tried building a static compiler without the newlib
directive, it didn’t work. Then we’ll use that compiler and libgcc to
build the C library; and then we can use the files provided by the C
library to go back and build a full, functional GCC compiler.
${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/crosstools \
--build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=aarch64-cbl-linux-gnu \
--with-sysroot=/home/lbl/work/sysroot --with-build-sysroot=/home/lbl/work/sysroot \
--disable-decimal-float --disable-libgomp --disable-libmudflap \
--disable-libssp --disable-multilib --disable-nls --disable-shared \
--disable-threads --enable-languages=c,c++ --with-newlib \
--without-headers \
--with-gmp=/usr --with-mpfr=/usr \
--with-mpc=/usr --with-isl=/usr \
--enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-cpu=cortex-a72.cortex-a53
The only targets we build at this point are all-gcc
, which produces
the plain compiler, and all-target-libgcc
, which produces the minimal
libgcc that compiler needs.
make all-gcc all-target-libgcc
(none)
make install-gcc install-target-libgcc
8.12. glibc
Name |
GNU C standard library |
---|---|
Version |
2.34 |
Project URL |
|
SCM URL |
git://sourceware.org/git/glibc.git |
Download URL |
|
Patches |
|
Dependencies |
8.12.1. Overview
glibc is the C standard library produced as part of the GNU project. It
contains the implementation for all of the functions that are assumed to
be available in C programs — like printf
and so on. It also provides
a dynamic loader, which almost all programs use to find shared libraries
at runtime, and a few miscellaneous utility programs.
Since all C and C++ userspace programs (except the Linux kernel itself) link against the C library, glibc is by far the most deeply-embedded component of a GNU/Linux installation. Upgrading the Linux kernel is a relatively trivial operation compared to upgrading the C library installed on a computer.
There are alternative C libraries that can be used instead of glibc. However, glibc is the C standard library used by the vast majority of GNU/Linux systems; using an alternative library like musl or uClibc-ng as the primary C library may cause problems somewhere down the line.
8.12.2. glibc (gnu-cross-toolchain phase)
Build Directory |
|
---|
- Dependencies
This builds a sysroot libc (for the target architecture), configured to
install into the /scaffolding
subdirectory of the sysroot.
That might be a little bit opaque, so let’s break it down a little bit.
As mentioned earlier, the sysroot framework is all about setting up a
path on the host system that will eventually become the root filesystem
on the target system. And, also as mentioned earlier, the goal in the
first stage of the CBL build process is to set up a kernel and minimal
userspace for the target system, with the entirety of that userspace
contained in a /scaffolding
subdirectory rather than using the
conventional filesystem paths (like /bin
and /lib
and the /usr
directory structure and all the rest of that stuff you’re used to
seeing).
What we’re building here is the libc that’s going to be used to build
all the scaffolding programs and libraries — the stuff that we’re
cross-compiling — so we want it to be contained in the scaffolding
directory, and found by the scaffolding programs at runtime in that
location. Hence, it’s a sysroot libc, with an extra prefix to move it
from its usual normal /lib
and /usr/lib
directories to the
/scaffolding/lib
directory.
That means we configure it with a prefix of /scaffolding
, but then
when we install it we tell it that the root of the installation location
is the sysroot directory /home/lbl/work/sysroot
.
Simple, right?
If you ever want to set up a completely standard sysroot toolchain, by
the way, it works pretty much the same way as this but you specify
--prefix=/usr
and --with-headers=/home/lbl/work/sysroot/usr/include
. There
is some magic in the glibc configuration or build machinery related to
the --prefix
directive: if you specify a prefix of /usr
, the bits of
glibc that are conventionally installed in /lib
will be put there,
rather than in /usr/lib
.
- Build Directory
-
../build-glibc-1
Since this glibc is built for the target machine architecture, a number
of tests run by the configure script won’t work right. The way we work
around that, for glibc as with everything else that uses the GNU build
system, is by setting the correct values in a config.cache
ahead of
time.
An option that we’re not using here is --enable-kernel
, which can
limit the amount of compatibility code built into glibc to support old
Linux kernel versions. Since the glibc being built here is temporary and
will be discarded in its entirety, saving a little bit of space here is
kind of pointless. That option also makes the build more fragile, since
the kernel version that will be checked by the code is the host system
kernel; we don’t want to make any more presumptions about the host
system than we must.
echo "libc_cv_forced_unwind=yes" > config.cache
echo "libc_cv_c_cleanup=yes" >> config.cache
echo "libc_cv_gnu89_inline=yes" >> config.cache
echo "libc_cv_ctors_header=yes" >> config.cache
echo "libc_cv_ssp=no" >> config.cache
echo "libc_cv_ssp_strong=no" >> config.cache
BUILD_CC="gcc" CC="aarch64-cbl-linux-gnu-gcc" AR="aarch64-cbl-linux-gnu-ar" \
RANLIB="aarch64-cbl-linux-gnu-ranlib" CFLAGS="-g -O2" \
${LB_SOURCE_DIR}/configure --prefix=/scaffolding \
--host=aarch64-cbl-linux-gnu --build=x86_64-unknown-linux-gnu \
--disable-profile --enable-add-ons --with-tls --with-__thread \
--with-binutils=/home/lbl/work/crosstools/bin \
--with-headers=/home/lbl/work/sysroot/scaffolding/include \
--cache-file=config.cache
The glibc build process uses the makeinfo
program to create the
documentation, and the texinfo source file specifies a document encoding
of UTF-8. When using some versions of perl, this leads to problems — it’s unclear why; perhaps this is because makeinfo
only works right
with UTF-8 documents when being run with UTF-8 localization settings, or
maybe something else is going wrong.
Regardless of exactly what triggers the issue, there’s an easy way to
work around it: just remove the @documentencoding
directive from the
libc manual source file.
sed -i -e '/^@documentencoding UTF-8$/d' \
${LB_SOURCE_DIR}/manual/libc.texinfo
make
(none)
make install_root=/home/lbl/work/sysroot install
8.13. gcc (gnu-cross-toolchain phase)
For an overview of gcc, see gcc.
Environment |
|
---|---|
Build Directory |
|
- Dependencies
Now that we have a C library installed, we can finally do a full GCC build. So now we’ll enable multi-threaded code and some of the runtime libraries we turned off previously.
- Build Directory
-
../build-gcc-3
- Environment variable: CPPFLAGS_FOR_TARGET
-
--sysroot=/home/lbl/work/sysroot
- Environment variable: LDFLAGS_FOR_TARGET
-
--sysroot=/home/lbl/work/sysroot
${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/crosstools \
--build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu --target=aarch64-cbl-linux-gnu \
--with-sysroot=/home/lbl/work/sysroot --with-build-sysroot=/home/lbl/work/sysroot \
--disable-multilib --disable-nls --enable-languages=c,c++ \
--enable-__cxa_atexit --enable-shared --enable-c99 \
--enable-long-long --enable-threads=posix \
--with-native-system-header-dir=/scaffolding/include \
--with-gmp=/usr --with-mpfr=/usr \
--with-mpc=/usr --with-isl=/usr \
--enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-cpu=cortex-a72.cortex-a53
Fairly early in the build process, a cross-compiler version of gcc
is
built and installed as xgcc
, for use in later build steps. Then later
on, when building libgcc.so
and various other libraries that are part
of GCC, xgcc
runs the cross-binutils ld
. Unforunately, xgcc
doesn’t know to tell ld
to look in the /scaffolding
directory to
find the startup files (crti.o
and so on). xgcc
also insists on
looking for libraries and startup files in the variant lib
directories
that support the multilib scheme, even though we configure GCC with
--disable-multilib
; it doesn’t appear that there’s any way to coerce
the GCC build machinery not to use those multilib directories.
A workaround that sometimes helps is to add the correct directory
through LDFLAGS_FOR_TARGET
and CFLAGS_FOR_TARGET
. That’s no good in
this case, though: many of the libraries in gcc use the libtool
script
to do all their compilation and linking, and libtool ignores the
LDFLAGS
and CFLAGS
we set.
So to get this build completed, we use a kludge: we symlink
/home/lbl/work/sysroot/scaffolding/lib
to all the locations where xgcc
might
expect to find the libraries.
If the build crashes and gets restarted, the ln
commands will fail.
That doesn’t matter, so we temporarily tell the shell to proceed rather
than terminating if an error occurs.
set +e
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/lib
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/lib32
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/lib64
ln -s /home/lbl/work/sysroot/scaffolding/lib /home/lbl/work/sysroot/libx32
set -e
make AS_FOR_TARGET="aarch64-cbl-linux-gnu-as" LD_FOR_TARGET="aarch64-cbl-linux-gnu-ld"
(none)
make install
rm -f /home/lbl/work/sysroot/lib*
8.14. Adjusting the GCC specs
8.14.1. Overview
As mentioned earlier, in gcc
: The driver program, the gcc
program we
think of as a compiler really just runs other programs, and it uses a
bunch of directives called "spec strings" to determine what programs to
run and what options to give them. The format of specs strings is
documented in the GCC documentation in section 3.15, "Specifying
Subprocesses and the Switches to Pass to Them." Spec strings don’t have
the most readable structure — I find it helpful to think of them as
being written in a domain-specific language, because to me they look as
much like line noise as complicated regular expressions do — but
sometimes there’s no better way to figure out what is going on than to
read the specs, and there is often no better way to adjust the behavior
of gcc
than to modify the specs it is using.
We have to do this — modify the specs — a few times throughout the CBL
process, primarily because we need to control how gcc
runs ld
to
link programs.
You can see the spec strings that gcc will use by running gcc
with the
-dumpspecs
option. The default specs are built in to gcc
, but you
can provide your own specs to override the default behavior by using the
-specs=
command-line argument or by creating a specs file and putting
it in a specific location in the filesystem.
We’re going to do the latter. The basic process is to first dump the
specs file to the location where gcc
will look for it:
gcc -dumpspecs > $(dirname $(gcc -print-libgcc-file-name))/specs
Then we can modify it however we need to, and gcc
will use the
modified version.
8.14.2. Adjusting the GCC specs (gnu-cross-toolchain phase)
- Dependencies
Caution
|
If you look at the structure of the toolchain directory structure,
you’ll see that there are a couple of different ways you can refer to
the programs in it. First, in the That means that unless you put the path to the directory containing
|
Not to belabor the point, but remember that the whole purpose of this cross-toolchain is to let us build the scaffolding that will then allow us to build the final target system entirely from source code.
While we’re using the scaffolding tools to build the final system, we
want to do a native build of everything, including glibc, so we want the
scaffolding to be independent of any filesystem locations that will
still be present in the final system. In particular, this includes
/lib
and /usr/lib
. That way, once we’re done doing the target system
build, we can delete the scaffolding directory altogether and be
confident that there are no lingering host-system artifacts or ephemera
on it.
When the standard GNU toolchain builds an executable, it almost always
links it against the dynamic link library (sometimes called the "dynamic
loader" or "program interpreter"; this is something like ld-linux.so.2
or ld.so.1
, and is conventionally found in a top-level directory
called /lib
or a multilib variant like /lib32
or
/lib64
).[5] That’s normally fine, but we want the scaffolding to be entirely
independent of /lib
. So we need to adjust our cross-toolchain so that
the programs it builds look in the /lib
directory under the
scaffolding location for their libraries, including the dynamic link
library. This is done by modifying the GCC specs file.
aarch64-cbl-linux-gnu-gcc -dumpspecs | \
sed -e 's@/lib/ld@/scaffolding/lib/ld@g' \
-e 's@/lib32/ld@/scaffolding/lib/ld@g' \
-e 's@/libx32/ld@/scaffolding/lib/ld@g' \
-e 's@/lib64/ld@/scaffolding/lib/ld@g' > \
$(dirname $(aarch64-cbl-linux-gnu-gcc --print-libgcc-file-name))/specs
There’s another problem we need to work around, as well: gcc
doesn’t
provide any good way to tell it where to find some object files that
need to be linked into every program: crti.o
, crti1.o
, and crtn.o
.
These are provided as a part of glibc, so like the rest of glibc they
were installed into /home/lbl/work/sysroot/scaffolding
, specifically into its
/lib
subdirectory. But, of course, that’s not a location where gcc
normally expects to find them. So at this point if you were to try
compiling a "Hello World" program with your cross-toolchain, it would
complain that the cross-ld can’t find crt1.o
or crti.o
.
You can tell exactly what is going wrong by repeating the compile with
-v
, to get gcc to print out all the commands it’s running: cc1
to
compile the code, then as
to assemble it, then collect2
to link it.
And apparently collect2
is running ld
, which is producing the error
message. (That’s weird, but you get used to weird stuff when you’re
trying to figure out toolchain problems. There’s also no explicit
execution of cpp
; that’s because the C pre-processor is actually
implemented in the libcpp
library and is invoked as function calls to
that library by the cc1
compiler program.)
Once you have the actual command line that’s failing, you can try
adjusting it to see if there’s an easy way to get it to work. For
example, the command line just names the startup files without any path
components at all. After fussing around for a while looking for pleasant
alternatives, the best solution I found was to specify the filenames
with absolute paths. So that’s what we’re going to do in the specs file:
replace every occurrence of the bare filename with absolute paths, for
all the object files that appear in the scaffolding/lib
directory.
for FILE in crt1 crti crtn gcrt1 Mcrt1 Scrt1; do \
sed -i -e "s@\\b$FILE.o\\b@/home/lbl/work/sysroot/scaffolding/lib/$FILE.o@g" \
$(dirname $(aarch64-cbl-linux-gnu-gcc --print-libgcc-file-name))/specs; \
done
We can now verify that the cross-toolchain is able to build programs
successfully, and is set up to link against the sysroot glibc in the
/scaffolding
directory, by compiling any simple program (like "Hello
World") and then running readelf
on it — there will be a line in the
program headers section that says "Requesting program interpreter:" and
should contain the path to the dynamic link library in the scaffolding
location.
8.15. Verify that a toolchain works properly
8.15.1. Overview
After building a significant toolchain component, it’s a good idea to make sure that it works as intended. This is a simple smoke-test: it just compiles a "Hello, World" program and then inspects it to make sure it was built as expected and runs properly.
8.15.2. Verify that a toolchain works properly (gnu-cross-toolchain phase)
- Environment variable: LD_LIBRARY_PATH
-
/home/lbl/work/crosstools/lib:$LD_LIBRARY_PATH
- Environment variable: PATH
-
/home/lbl/work/crosstools/bin:$PATH
- Dependencies
-
gcc.
This verifies that the cross-toolchain and emulator work properly: look at the machine type and dynamic linker location for a compiled program, and then make sure that it runs in the userspace emulator. This proves that the cross-toolchain and QEMU were properly built with a compatible target architecture and so on.
#include <stdio.h>
int main(void)
{
printf("Hello, QEMU Emulated aarch64-cbl-linux-gnu World!\n");
return 0;
}
This is compiled with:
aarch64-cbl-linux-gnu-gcc /home/lbl/work/build/hello.c -o /home/lbl/work/build/hello
To verify that it’s linked properly, use readelf.
aarch64-cbl-linux-gnu-readelf -a /home/lbl/work/build/hello | tee \
/home/lbl/work/build/program_info
grep 'Machine:' /home/lbl/work/build/program_info | grep \
'AArch64'
grep 'interpreter: /scaffolding/lib' /home/lbl/work/build/program_info
The Machine
line should indicate the target architecture, rather than
the host architecture, and the program interpreter requested by the
program should be under the /scaffolding/lib
directory (which is where
the dynamic loader will be found once we’re booted into the target
system). If either of those is not the case, the grep
commands will
fail, causing the CBL build process to abort.
One last thing we can usefully do at this point is verify that the user-mode QEMU emulator can actually run programs for the target architecture.
This is a bit tricky when running the dynamically-linked program we just
built, because it is expecting to find the program interpreter at
/scaffolding/lib
but in fact it’s actually at that location under the
sysroot directory. Luckily, the user-mode QEMU emulator can be told
where to find library files, using the -L
command line argument or the
QEMU_LD_PREFIX
environment variable.
qemu-aarch64 -L /home/lbl/work/sysroot /home/lbl/work/build/hello | \
grep 'Hello, QEMU Emulated aarch64-cbl-linux-gnu World'
This will produce a friendly greeting, or — if something goes wrong — will, again, cause the build process to abort.
(If you’d like, you can try running the hello program outside of QEMU as well; that should produce an error message like "cannot execute binary file." And, again, if it doesn’t, that means something is horribly wrong!)
8.15.3. Complete text of files
8.15.3.1. /home/lbl/work/build/hello.c
#include <stdio.h>
int main(void)
{
printf("Hello, QEMU Emulated aarch64-cbl-linux-gnu World!\n");
return 0;
}
Building a GNU/Linux system from the ground up is like constructing a building. At least, that’s the analogy we had in mind when we were designing the CBL process and writing this book.
When constructing a building, it’s sometimes useful to start by assembling a scaffolding around the build site. Then you can climb onto the scaffolding and use it as a support and framework while you construct the building you actually want. Once the final building is complete, you can tear down and discard the scaffolding — it’s not important in and of itself, only as a means to an end.
That’s what we do in the CBL process: we use the cross-toolchain to
construct a set of programs and libraries that we can boot into, as an
ephemeral "scaffolding" framework from which we can build the actual
target system. We build that scaffolding in such a way that it sits
alongside the final system as we build it; the scaffolding components
won’t conflict with anything that will eventually form part of the final
Little Blue Linux system, because everything is self-contained within a
top-level /scaffolding
directory. After the build is complete, we’ll
delete /scaffolding
.
9. Ensuring isolation from the host system
It’s possible for the build process of some of the scaffolding programs to find things on the host system and try to compile or link against them. This doesn’t work, of course, because the scaffolding programs are for the target machine architecture, and that’s incompatible with the host system architecture. Unfortunately, it still causes the build to fail. We can prevent that by setting up some programs that will isolate us from the host system.
This really should not be needed! Everything we’re building in the scaffolding is being cross-compiled, and packages should never try to compile or link against any host-system libraries when they’re being cross-compiled. But I ran into an issue with this at least once, and setting up a small guard against this kind of build-system bug is not hard to do.
9.1. pkgconf
Name |
Improved pkg-config |
---|---|
Version |
1.8.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
9.1.1. Overview
pkg-config is a program that makes it easy to find installed libraries and header files and things. Many programs use pkg-config in their build processes to find out if the libraries they depend on are present on the build system, and to find out what linker and include directives should be used to compile and link against them.
The original pkg-config program can be found at
pkg-config.freedesktop.org
. At some point in its development, its
developers decided to use functions from the glib library;
unfortunately, glib uses pkg-config to find its own dependencies
(really, just zlib — glib doesn’t have a lot of dependencies), which
introduced a cyclic dependency. That doesn’t cause any really
intractable problems — a couple of environment variables can be defined
when building glib so that it doesn’t have to use pkg-config to find
zlib — but cyclic dependencies are always kind of horrifying: not to
sound like a broken record, but the idea with CBL is to start with a
minimal set of binaries and a pile of source code and turn that into a
whole system. (Even worse than the glib dependency, pkg-config requires
itself, so that it can find and link against the glib library, unless
additional environment variables are provided.)
The situation has since been resolved, but it was resolved by bundling glib along with pkg-config. This isn’t a particularly elegant solution: the pkg-config distribution is about 11.5 megabytes of code, unpacked, and about 9.5 megabytes of that is glib.
A different approach was taken in a fork of pkg-config, "pkg-config-lite," which includes just the snippet of glib that is needed by pkg-config: that’s better, but still not ideal.
This brings us to pkgconf, a completely separate implementation of the pkg-config program. It has no external dependencies, doesn’t bundle any third-party code within its own distribution, and also has a design that I find preferable to the original pkg-config (it internally builds a directed acyclic graph of dependencies, rather than building an in-memory database of all known pkg-config files at runtime and then resolving dependencies from that database).
-
pkgconf-1.8.0-run-autogen-1.patch
As distributed, the pkgconf program is missing a lot of files that are
normally produced by the GNU autotools; the conventional way to build it
is to start by running the autogen.sh
script provided with the package
distribution. However, one of the places the pkgconf program is built is
as part of the host prerequisites; if that’s being done, it’s probably
not a great idea to rely on the autotools being present at prerequisite
build time. We can instead simply patch the source distribution to
create the files that would normally be created via autogen.sh
.
9.1.2. pkgconf (host-isolation phase)
This is built here because pkg-config is already present on the host system (as a prerequisite, if nothing else), and it probably knows about a lot of host system libraries. While building the scaffolding, we need to ensure that their build processes don’t find (and try to link against) any of the host system libraries. Since the host system libraries are for a different machine architecture, this would cause build failures. We can do that by putting a new version of pkg-config, configured only to look in the scaffolding directory, at the head of the PATH while building them.
We could probably get by with a simple shell script that always just says "I couldn’t find anything!" when asked for dependencies. On the other hand, it takes around ten seconds to build pkgconf and set it up, so why not just do that?
find . -exec touch -r README.md {} \;
./configure --prefix=/home/lbl/work/crosstools \
--with-pkg-config-dir=/home/lbl/work/sysroot/scaffolding/lib/pkgconfig \
--with-system-libdir=/home/lbl/work/sysroot/scaffolding/lib \
--with-system-includedir=/home/lbl/work/sysroot/scaffolding/include
make
(none)
make install
ln -sf pkgconf /home/lbl/work/crosstools/bin/pkg-config
In addition to the pkg-config symlink, we create a symbolic link aarch64-cbl-linux-gnu-pkg-config so that any build process that tries to find dependencies using a target-system pkg-config program will find ours.
ln -sf pkgconf /home/lbl/work/crosstools/bin/aarch64-cbl-linux-gnu-pkg-config
10. Construction of a minimal bootable userspace
In this section, we’re going to use our shiny new cross-toolchain to build all of the programs and libraries we’ll need to get to a working target-architecture userspace. We call these components, collectively, the "scaffolding" because we’re going to use them as a kind of staging area and framework from which we can construct the final CBL system.
It’s useful to keep that purpose in mind! These programs will provide just enough of a userspace environment that, once we’ve got the target system booted, we’ll be able to build the final system components using these as a foundation. That means we’ll need, basically, all the stuff that we’ve already been using from the host system — the programs that let us build programs — and we also need programs that will let us work with partition tables, filesystems, and other low-level operating system concerns.
It’s also useful to keep in mind that everything here is ephemeral. As soon as we get the target system booted, we’ll use these scaffolding programs to build all of the stuff that make up the final CBL system, and then we’re going to throw them away.
- Dependencies
10.1. About the Scaffolding
To maintain a hard line of separation between the scaffolding and the
final system components, we’re building all of this stuff so that it
installs into a directory called /scaffolding
. When we set up the root
filesystem for the target, it’s only going to have that one top-level
directory! Then, as the scaffolding programs are used to construct the
final system components, those final-system programs will be installed
to normal system directories — /bin
, /usr
, and so on — and will be
used in preference to the scaffolding programs that they replace.
In most cases, the scaffolding components use the GNU build system. That
means they can be configured to expect that they will live in
/scaffolding
, but then be installed with a DESTDIR
of the sysroot
directory. That way, they actually get installed to
/home/lbl/work/sysroot/scaffolding
(which is exactly where they need to be on
the host system), but think they’re installed to a top-level
/scaffolding
directory — which is where they actually will live once
we boot into the target device.
Building everything with a very simple environment is still a good idea.
- Environment variable: PATH
-
/home/lbl/work/crosstools/bin:$PATH
- Environment variable: LC_ALL
-
POSIX
Some of the scaffolding pieces install libraries and headers. We want those to be visible to the rest of the scaffolding, so CFLAGS and LDFLAGS are not as empty as they have previously been.
- Environment variable: CFLAGS
-
-I/home/lbl/work/sysroot/scaffolding/include
- Environment variable: CXXFLAGS
-
-I/home/lbl/work/sysroot/scaffolding/include
- Environment variable: LDFLAGS
-
-L/home/lbl/work/sysroot/scaffolding/lib
To use the cross-toolchain for these builds, we need to define a bunch of additional environment variables. Many of the scaffolding programs use the GNU build system, and therefore consult these environment variables to determine how to invoke toolchain programs.
- Environment variable: CC
-
aarch64-cbl-linux-gnu-gcc
- Environment variable: CXX
-
aarch64-cbl-linux-gnu-g++
- Environment variable: AR
-
aarch64-cbl-linux-gnu-ar
- Environment variable: AS
-
aarch64-cbl-linux-gnu-as
- Environment variable: RANLIB
-
aarch64-cbl-linux-gnu-ranlib
- Environment variable: LD
-
aarch64-cbl-linux-gnu-ld
- Environment variable: STRIP
-
aarch64-cbl-linux-gnu-strip
We’re also going to use the cross-toolchain options build
, host
, and
target
for most of the components we’re building in this section.
This time, --build
is going to be the host system; --host
and
--target
are going to refer to the target system.
10.2. Create Symbolic Links For Scaffolding Lib Directories
Some target architectures have a "multilib" feature, and the
installation process for many packages insists on installing library
files into a variety of different directories (like lib32
and lib64
)
to support this feature — even when multilib is disabled, as we try to
do throughout the CBL process. The issue with this is, not all of the
multilib directories are on the default library path known by the
dynamic loader; this leads to errors in some package builds.
To ensure that all library files wind up in the lib
directory per se
rather than a multilib variant directory, we use a ugly but simple and
effective kludge: we create symbolic links to ensure that all library
files are placed directly into /scaffolding/lib
. (In the target-side
build, we’ll do something similar for the /lib
and /usr/lib
directories: this is done in Write the Scaffolding Init Scripts.)
The set of symbolic links needed here might expand as additional targets are added to the set that CBL can handle.
mkdir -p /home/lbl/work/sysroot/scaffolding/lib
ln -s lib /home/lbl/work/sysroot/scaffolding/lib32
ln -s lib /home/lbl/work/sysroot/scaffolding/lib64
ln -s lib /home/lbl/work/sysroot/scaffolding/libx32
10.3. attr
Name |
Filesystem Extended Attributes programs |
---|---|
Version |
2.5.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.3.1. Overview
Most Linux filesystems support "extended attributes" — arbitrary name/value pairs that can be associated with files or directories. These can be used for any purpose; for example, you might attach a "user.file_encoding" extended attribute to a text file, if it’s encoded unusually.
One of the primary uses of extended attributes is to implement access control lists or capabilities.
The attr package provides programs that allow extended attributes to be viewed and modified.
10.3.2. attr (host-scaffolding-components phase)
Since the configuration repository preserves extended attribute metadata
as well as basic owner and mode, and since the package-users build
script sets a user.package_owner
attribute on all files that are
installed by packages, we need the getfattr
and setfattr
commands
early in the target system build.
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.4. bash
Name |
GNU Bourne Again SHell |
---|---|
Version |
5.1.8 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.4.1. Overview
Bash is a shell — it provides a command line prompt, parses commands and pipelines entered on the command line, executes programs based on those commands and pipelines, and then does it all again. If you’re interacting with a Linux system, and you’re not using a graphical environment like Xorg, you’re using a shell like bash. And, probably, you’re using bash — there are other shells, but they are not nearly as common.
The bash maintainers don’t provide all patchlevels for convenient download; you may have to download the tarfiles for the main version (like 4.3) and then separately download and apply all the update patches. The CBL repository, for convenience, has a tarfile that includes all of the patches. As of version 5.1.8, this may not be necessary — in addition to the 5.1 tarfile and the separate patches for it, I found a tarfile for 5.1.8 on the GNU FTP server.
10.4.2. bash (host-scaffolding-components phase)
Several of the tests run by the configure script don’t work right when bash is being cross-compiled. As with some other programs that use the GNU build system, we can short-circuit those tests by pretending that the configure script was run previously and knows the results from those tests.
Bash includes its own memory allocation routine, but historically it has
not always been reliable. We use --without-bash-malloc
to disable it
so that the C standard library’s malloc is used instead. For the
scaffolding version of bash, we also configure it to be linked
statically — that reduces the number of pieces that are needed to get
the minimal scaffolding-based system to boot.
echo "ac_cv_func_mmap_fixed_mapped=yes" > config.cache
echo "ac_cv_func_strcoll_works=yes" >> config.cache
echo "ac_cv_func_working_mktime=yes" >> config.cache
echo "bash_cv_func_sigsetjmp=present" >> config.cache
echo "bash_cv_getcwd_malloc=yes" >> config.cache
echo "bash_cv_job_control_missing=present" >> config.cache
echo "bash_cv_printf_a_format=yes" >> config.cache
echo "bash_cv_sys_named_pipes=present" >> config.cache
echo "bash_cv_ulimit_maxfds=yes" >> config.cache
echo "bash_cv_under_sys_siglist=yes" >> config.cache
echo "bash_cv_unusable_rtsigs=no" >> config.cache
echo "gt_cv_int_divbyzero_sigfpe=yes" >> config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--enable-static-link --without-bash-malloc --cache-file=config.cache
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.5. binutils (host-scaffolding-components phase)
For an overview of binutils, see binutils.
Build Directory |
|
---|
This is a new native set of the binutils, built using the
cross-toolchain: they will run on the target architecture, and produce
binaries that will also run on the target architecture. (But they’re
being built on the initial system, so we specify build
as such.)
- Build Directory
-
../build-binutils-3
${LB_SOURCE_DIR}/configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --target=aarch64-cbl-linux-gnu \
--disable-nls --enable-shared --disable-multilib \
--with-lib-path=/home/lbl/work/sysroot/scaffolding/lib \
--enable-64-bit-bfd
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.6. m4
Name |
GNU M4 |
---|---|
Version |
1.4.19 |
Project URL |
|
SCM URL |
git://git.sv.gnu.org/m4 |
Download URL |
10.6.1. Overview
M4 is a macro processor: it copies its standard input to standard output, expanding macros as it goes. It has a set of built-in macros that it understands, and it’s possible to add user-defined ones as well. M4 is used extensively in the GNU build system, primarily by Autoconf.
10.6.2. m4 (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.7. bison
Name |
GNU Bison |
---|---|
Version |
3.7.6 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
10.7.1. Overview
Bison is a parser generator. That means it takes a description of a language (in a specialized format called a "context-free grammar") and converts it into a parser for that language. This is mostly useful when writing compilers: the parser is the part of a compiler that takes a stream of tokens and figures out what syntax elements they represent. Parsers are fiddly and difficult to get correct, so it’s common to generate the code for parsers using tools like Bison.
The name "Bison" is kind of a joke. The original parser generator that was commonly available on UNIX systems is called "Yacc," which is an acronym for "Yet Another Compiler-Compiler." When the GNU project implemented their own parser generator, they called it "Bison" because "Yacc" sounds like "Yak."
Bison’s installation process also creates a program called yacc
that
simply runs bison
in yacc emulation mode.
10.7.2. bison (host-scaffolding-components phase)
The absolute path to the m4
program is written into bison
, so we
need to make sure that the scaffolding bison
will expect to find m4
in the scaffolding directory. This adds a little complexity, though: the
bison configuration script checks to see whether lex
is available and,
if it is, whether it is flex
. flex
uses m4
, so if we tell the
bison configuration that m4
can be found in the scaffolding, flex will
try to use that version of m4
, which was built for the target system
rather than the host system. As with other packages that use the GNU
Build System, we can use config.cache to tell bison that flex is simply
not available.
echo "ac_cv_prog_lex_is_flex=no" > config.cache
echo "ac_cv_prog_lex_root=no" >> config.cache
echo "ac_cv_lib_lex='none needed'" >> config.cache
M4=/scaffolding/bin/m4 ./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--cache-file=config.cache
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.8. bzip2
Name |
Block-sorting compression utility |
---|---|
Version |
1.0.8 |
Project URL |
|
SCM URL |
git://sourceware.org/git/bzip2.git |
Download URL |
10.8.1. Overview
Bzip2 is a compression program like gzip, but uses the Burroughs-Wheeler block sorting algorithm and Huffman coding. It is a lot slower than gzip for both compressing and decompressing, but compresses data much better in most cases.
Bzip2 doesn’t use the GNU build system, so there isn’t a configure script.
10.8.2. bzip2 (host-scaffolding-components phase)
There isn’t really a configuration step for bzip2, but we’ll use the configuration stage to modify the build process so it won’t try to run the tests — they won’t work when building with a cross-toolchain.
The regular Makefile only creates a static version of the library, which
is fine; however, it doesn’t compile that library with -fPIC
, which
causes problems later on when other programs try to use it. We can add
that directive directly in the Makefile.
mv Makefile Makefile.orig
sed -e 's@^\(all:.*\) test@\1@g' -e 's@^CFLAGS=@CFLAGS=-fPIC @' \
Makefile.orig > Makefile
Because bzip2 doesn’t use the GNU build system, we need to specify the
cross-tools as arguments to make
.
make CC="${CC}" AR=${AR} RANLIB=${RANLIB}
(none)
make PREFIX=/home/lbl/work/sysroot/scaffolding install
10.9. coreutils
Name |
GNU Core Utilities |
---|---|
Version |
8.32 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
10.9.1. Overview
The Core Utilities are basic file, shell, and text manipulation utility programs: things like cat, chmod, chown, cp, … stuff like that. You use these all the time.
-
coreutils-8.32-revert-removed-dir-error-1.patch
Version 8.32 of coreutils introduced a misfeature that breaks 64-bit ARM builds on Linux. Paul Eggert provided a patch that reverts that misfeature, so we apply that here.
10.9.2. coreutils (host-scaffolding-components phase)
A couple of the tests run by the configure script don’t work right when the coreutils are being cross-compiled. As with other programs that use the GNU build system, we can short-circuit those tests by pretending that the configure script was run previously and knows the results from those tests.
echo "fu_cv_sys_stat_statfs2_bsize=yes" > config.cache
echo "gl_cv_func_working_mkstemp=yes" >> config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--enable-install-program=hostname --cache-file=config.cache
make
(none)
sed -i 's@^cu_install_program =.*@cu_install_program = install@' Makefile
make DESTDIR=/home/lbl/work/sysroot install
10.10. diffutils
Name |
GNU Diffutils |
---|---|
Version |
3.8 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.10.1. Overview
Diffutils is a bundle of programs that find differences between files
and help you to merge them together: cmp
, diff
, diff3
, sdiff
.
The most important program here is diff
, which finds differences
between files. It can find differences between two files, in the
simplest case; but, more commonly, diff
is used to find all the
differences between all the files in two entire directory trees.
You can save the output from diff
when used this way as a "patch
file", and then use the patch
program to take one of the directory
trees and make it look just like the other one.
This is handy when distributing modified versions of software packages:
if you’ve made a minor change to GCC, for example, and you want to
submit that change to the GCC mailing list for consideration by the GCC
maintainers, you can use diff
between the original GCC source tree and
your modified GCC source tree, and then include the output from diff
in your email.
10.10.2. diffutils (host-scaffolding-components phase)
When cross-compiling, the configure script guesses that the getopt
function provided by GNU glibc is not available and tries to use an
internal one instead. This doesn’t work, for some reason (possibly
because of behavior changes in GCC 7). Since the glibc getopt is
available, we can just tell the configure script not to guess about it.
echo "gl_cv_func_getopt_gnu='yes'" > config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--cache-file=config.cache
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.11. util-linux
Name |
miscellaneous linux utilities |
---|---|
Version |
2.37 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
10.11.1. Overview
util-linux is a grab-bag of miscellaneous Linux utility programs, for
all kinds of things: disk partitioning, filesystem creation and
validation and mounting… all kinds of stuff like that. (I’d expect a
lot of this stuff, like more
, mount
, and umount
— to be a part of
the GNU system, perhaps in the coreutils package, but that’s not the
only thing I find surprising about the GNU/Linux world.)
10.11.2. util-linux (host-scaffolding-components phase)
We configure with an option that tells the build machinery not to chgrp
the wall
program to the tty group. This will probably make wall
non-functional, but that’s perfectly okay — wall
can be used to send
messages simultaenously to all users logged on to the system, as for
example if the system is going to be shut down or something of the sort,
which is not very useful on single-user computers and certainly not
necessary in the minimal scaffolding
userspace.
You’ll notice that we’re configuring this package with a prefix of
/home/lbl/work/sysroot/scaffolding
instead of just /scaffolding
(and
installing without a DESTDIR
). That’s because, when we configure and
install util-linux the way we’re doing most of the scaffolding
components, we’ve sometimes encountered problems where some components
can’t find the libuuid
library that is provided by this package. There
are probably other ways to address that, and it’s not really clear
whether those components are strictly necessary, but this non-standard
configuration scheme doesn’t cause any problems and works fine.
The installation routine for util-linux changes the ownership and mode
of some programs (like mount
and umount
) to be setuid to root. This
doesn’t work when running the installation as a normal user, as is being
done in this section, so we disable that.
./configure --prefix=/home/lbl/work/sysroot/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --disable-use-tty-group \
--without-ncurses --without-ncursesw --disable-makeinstall-chown \
--disable-makeinstall-setuid
make
(none)
make install
10.12. e2fsprogs
Name |
Ext2/3/4 Filesystem Utilities |
---|---|
Version |
1.46.3 |
Project URL |
|
SCM URL |
git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git |
Download URL |
https://mirrors.edge.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ |
Dependencies |
10.12.1. Overview
The ext2fsprogs are utilities for managing the ext2 (and later) filesystems. ext2 was the first popular filesystem for Linux systems and its latest incarnation, ext4, is the most common. This package contains programs for creating filesystems, reconfiguring them, checking them for problems, that sort of thing.
There’s some overlap between e2fsprogs and util-linux: both provide
libuuid
and libblkid
libraries, a uuidd
daemon, and a fsck
script. We skip those from e2fsprogs.
- Build Directory
-
build
10.12.2. e2fsprogs (host-scaffolding-components phase)
Build Directory |
|
---|
There is a glitch in the configuration logic that creates a file outside
of the DESTDIR
-specified path when there is no /etc/cron.d
directory. This can be avoided by explicitly specifying that there is no
crond directory at all.
../configure --prefix=/scaffolding \
--enable-elf-shlibs --build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--disable-libblkid --disable-libuuid --disable-fsck --disable-uuidd \
--without-crond-dir
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
make DESTDIR=/home/lbl/work/sysroot install-libs
10.13. expat
Name |
Expat XML parser |
---|---|
Version |
2.4.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.13.1. Overview
Expat is a library that provides a stream-oriented XML parser. It’s used by all sorts of other programs.
10.13.2. expat (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --without-docbook
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.14. file
Name |
file type guesser |
---|---|
Version |
5.39 |
Project URL |
|
SCM URL |
|
Download URL |
10.14.1. Overview
file
guesses file types by looking for particular characteristic byte
values within them. If you’ve got a file and you’re not sure what it
actually is, you can run file
on it and be told things like "that’s a
text file," or "that’s a JPEG image."
10.14.2. file (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.15. findutils
Name |
GNU Find Utilities |
---|---|
Version |
4.8.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.15.1. Overview
The Find Utilities are basic directory searching programs: mostly find
and locate
. This package also includes updatedb
, which creates the
file name database that locate
uses; and xargs
, which takes lists of
file names produced by find
and generates command lines that operate
on all the files in those lists.
10.15.2. findutils (host-scaffolding-components phase)
A couple of the tests run by the configure script don’t work right when the findutils are being cross-compiled. As with some other programs that use the GNU build system, we can short-circuit those tests by pretending that the configure script was run previously and knows the results from those tests.
echo "gl_cv_func_wcwidth_works=yes" > config.cache
echo "ac_cv_func_fnmatch_gnu=yes" >> config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--cache-file=config.cache
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.16. gawk
Name |
GNU awk |
---|---|
Version |
5.1.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.16.1. Overview
AWK is a special-purpose programming language that makes it easy to do simple text manipulations. (The name is an acronym of the three people who designed the language: Alfred Aho, Peter Weinberger, and Brian Kernighan.) Gawk is the GNU implementation of the AWK language.
If you’re comfortable in perl or python or ruby, you probably won’t have much use for awk; on the other hand, if you’re working in a bash script, awk might turn out to be very handy.
10.16.2. gawk (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.17. gmp (host-scaffolding-components phase)
For an overview of gmp, see gmp.
Build Directory |
|
---|
Although this is a cross-build of GMP, GMP doesn’t like having a
target
specified, just build
and host
. The manual explains that
this is because target
is for toolchain programs, and specifies the
kind of machine where programs they produce will run; GMP doesn’t
produce programs, it’s just a library, so target
is meaningless for
it.
Depending on the target architecture and ABI, it may be necessary to
specify additional configure variables or arguments. (You’ll know that
you need one if the build fails immediately.) The TARGET_GMP_CONFIG
parameter is available for that purpose.
The configure for this and the other GCC dependencies doesn’t use the
--prefix
and DESTDIR
trick to install to a different directory than
the prefix, because that doesn’t work well with the layered
dependencies of mpfr and mpc later. That means that some of the files
installed by these dependencies refer to host-system directories that
won’t exist on the target system; we’ll fix those up later.
- Build Directory
-
../build-gmp-3
${LB_SOURCE_DIR}/configure \
--prefix=/home/lbl/work/sysroot/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --enable-cxx
make
(none)
make install
10.18. mpfr (host-scaffolding-components phase)
For an overview of mpfr, see mpfr.
Build Directory |
|
---|
- Build Directory
-
../build-mpfr-3
${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/sysroot/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --target=aarch64-cbl-linux-gnu \
--with-gmp=/home/lbl/work/sysroot/scaffolding
make
(none)
make install
10.19. mpc (host-scaffolding-components phase)
For an overview of mpc, see mpc.
Build Directory |
|
---|
- Build Directory
-
../build-mpc-3
${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/sysroot/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --target=aarch64-cbl-linux-gnu \
--with-gmp=/home/lbl/work/sysroot/scaffolding \
--with-mpfr=/home/lbl/work/sysroot/scaffolding
make
(none)
make install
10.20. isl (host-scaffolding-components phase)
For an overview of isl, see isl.
Build Directory |
|
---|
- Build Directory
-
../build-isl-3
${LB_SOURCE_DIR}/configure --prefix=/home/lbl/work/sysroot/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --target=aarch64-cbl-linux-gnu \
--with-gmp-prefix=/home/lbl/work/sysroot/scaffolding
make
(none)
make install
10.21. zlib
Name |
zlib compression library |
---|---|
Version |
1.2.11 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
10.21.1. Overview
Zlib is a compression library. It implements the same Lempel-Ziv compression algorithm as gzip and info-zip, which means it doesn’t compress data as effectively as most other algorithms (like the Burrows-Wheeler algorithm used by bzip2 or the LZMA algorithm used by xz-utils), but on the other hand it doesn’t use very much memory to compress, and it’s pretty fast.
There are a lot of programs that link against zlib to get basic compression capabilities. We’re not totally sure which components will fail to build if it’s not available, but zlib itself has no dependencies and is really fast and small to build, so making it available is no big deal.
10.21.2. zlib (host-scaffolding-components phase)
./configure --prefix=/scaffolding
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.22. gcc (host-scaffolding-components phase)
For an overview of gcc, see gcc.
Environment |
|
---|---|
Build Directory |
|
- Dependencies
-
zlib.
As with binutils, this is going to produce a native compiler that will be used on the target system to let us build the pieces of scaffolding that are tricky to cross-compile, and the first few components of the final CBL system. Those first few components will include the final system binutils and GCC; as soon as we have those built and installed, we won’t be using this compiler any more.
This is one of the checkpoint steps. If there’s any problem with the cross-toolchain or the way it’s set up, it will probably cause the target-native GCC build to fail. If you can get past this step, you can relax a little bit!
For this build, we need to ensure that some of the FLAGS
environment
variables are empty; otherwise, the build system can get confused and
try to use target-system header files to compile programs to be run on
the host system. That doesn’t work!
(You might be wondering why GCC builds programs for the host system when doing a target-native build. It’s because the build process constructs some programs that it then immediately runs; the output of those programs is more source code, which then gets compiled for the target environment. Compiler build systems are complicated.)
- Environment variable: CFLAGS
-
(should not be set)
- Environment variable: CXXFLAGS
-
(should not be set)
- Environment variable: LDFLAGS
-
(should not be set)
In-tree compilation of all the various dependencies is problematic for
this build: for some architectures, like ARM and Sparc32, when GCC does
the in-tree configuration of GMP, it configures it with a host
and
target
starting with none-
. That apparently was a working
configuration in old versions of GMP, but current versions break when
this is done and — again, only on certain CPU architectures! — result
in errors when trying to link ISL a few steps later on.
As of this writing, this is an open issue. There are several ways this
could be fixed or worked around: for example, GCC could be patched
(specifically, the configure-gmp targets in Makefile.def and the
Makefile.in generated from it, where they specify the invalid host
and
target
parameters — probably, there is a better way to tell GMP to
disable asm optimizations). For CBL, the way we address this situation
is simply to move the dependency library builds out-of-tree.
- Build Directory
-
../build-gcc-4
${LB_SOURCE_DIR}/configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --target=aarch64-cbl-linux-gnu \
--with-local-prefix=/home/lbl/work/sysroot/scaffolding --with-system-zlib \
--with-native-system-header-dir=/home/lbl/work/sysroot/scaffolding/include \
--enable-languages=c,c++ --enable-checking=release \
--disable-multilib --disable-nls --disable-libssp \
--with-gmp=/home/lbl/work/sysroot/scaffolding \
--with-mpfr=/home/lbl/work/sysroot/scaffolding \
--with-mpc=/home/lbl/work/sysroot/scaffolding \
--with-isl=/home/lbl/work/sysroot/scaffolding \
--enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-cpu=cortex-a72.cortex-a53
make GCC_FOR_TARGET="aarch64-cbl-linux-gnu-gcc" \
CC_FOR_TARGET="aarch64-cbl-linux-gnu-gcc" CXX_FOR_TARGET="aarch64-cbl-linux-gnu-g++" \
AS_FOR_TARGET="aarch64-cbl-linux-gnu-as" LD_FOR_TARGET="aarch64-cbl-linux-gnu-ld"
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.23. gettext
Name |
GNU Gettext |
---|---|
Version |
0.21 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.23.1. Overview
Gettext is a set of tools that can be used by other programs to provide internationalization and localization capabilities. This lets a single program provide user interfaces and user-facing messages in a variety of languages without requiring it to be rebuilt.
10.23.2. gettext (host-scaffolding-components phase)
One of the tests run by the configure script doesn’t work right when the coreutils are being cross-compiled. As with some other programs that use the GNU build system, we can short-circuit those tests by pretending that the configure script was run previously and knows the results from those tests.
If the emacs editor is available on the system, the gettext build will do something with it. I don’t really understand what it is, but it generates a couple of gigabytes worth of log output and makes the build process take orders of magnitude longer than it otherwise would. Luckily, it is easy to disable that with a configuration flag.
echo "gl_cv_func_wcwidth_works=yes" > config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--cache-file=config.cache --without-emacs
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.24. grep
Name |
GNU grep |
---|---|
Version |
3.6 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.24.1. Overview
Grep searches input files for lines that match patterns called "regular expressions." It often prints those lines out as well.
We have heard that the program name "grep" was originally derived from
the ed
command "g/re/p", which does basically the same thing as the
command-line grep
program, and wikipedia makes this claim as well.
Maybe it’s true!
10.24.2. grep (host-scaffolding-components phase)
The GNU libc library provides regular expression functions, but grep ignores them by default and uses its own bundled regex library. We use a configuration flag to override that behavior.
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--without-included-regex
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.25. gzip
Name |
GNU zip compression utility |
---|---|
Version |
1.10 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.25.1. Overview
Gzip is a compression program that basically does the same thing as the zlib library, but with a command-line interface.
10.25.2. gzip (host-scaffolding-components phase)
Environment |
|
---|
The default -Werror
setting for gzip causes a format-truncation
message produced by GCC 8 to abort the build. I am not all that
concerned about format-truncation problems when printing error messages,
so I just disable that setting.
- Environment variable: CFLAGS
-
$CFLAGS -Wno-error=format-truncation
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
The generated Makefiles for gzip include a -Wabi
directive that GCC 8
complains about. We can just remove that directive.
find . -name Makefile | while read filename; \
do \
sed -i -e 's@-Wabi@@g' $filename; \
done
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.26. kbd
Name |
Keyboard Utilities |
---|---|
Version |
2.2.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
10.26.1. Overview
The Kbd package contains key-table files and keyboard utilities.
The vlock
program requires Pluggable Authentication Modules
(Linux-PAM), which is not part of the basic CBL system, so we always
disable it. Conversely, other programs that aren’t built by default
might be helpful, so we enable those.
10.26.2. kbd (host-scaffolding-components phase)
We need some of the programs provided by kbd (notably, openvt
, which
will let us run interactive shells on virtual terminals in the minimal
target system) in the minimal userspace.
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--disable-vlock --enable-optional-progs
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.27. libffi
Name |
Foreign Function Interface library |
---|---|
Version |
3.4.2 |
Project URL |
|
SCM URL |
git://github.com/libffi/libffi |
Download URL |
10.27.1. Overview
Libffi is a portable library that helps programs written in one language to invoke functions that were written in a different language.
10.27.2. libffi (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.28. libressl
Name |
LibreSSL |
---|---|
Version |
3.3.3 |
Project URL |
|
SCM URL |
git://github.com/libressl-portable/ |
Download URL |
10.28.1. Overview
LibreSSL is a fork of OpenSSL with the goals of modernizing the codebase, improving the reliability and security of the code, and applying better development practices.
10.28.2. libressl (host-scaffolding-components phase)
LibreSSL should not be needed in the scaffolding, but the current version of rubygems has issues when used in a ruby installation without openssl support.
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.29. linux (host-scaffolding-components phase)
For an overview of linux, see linux.
Now we’re going to build a kernel that will let us use the minimal
userspace we constructed in /scaffolding
.
Although the Linux kernel per se is entirely self-contained and
doesn’t use any functions defined in shared libraries, the kernel build
process compiles a program extract-cert
that requires a TLS library.
Both LibreSSL and OpenSSL work fine for this. Presumably, the host
system already provides this! If not, build and install LibreSSL first.
Kernel configuration is generally a manual business, and it is tricky to
automate. The normal process I follow is to start with the configuration
that produced the current kernel, then use make olddefconfig
to define
settings for any newly-added configuration keys, and then use one of the
more friendly user interfaces (I usually use nconfig
, a text-mode
configuration program written by Nir Tzachar, because I am often working
on a system that doesn’t have a GUI set up and have gotten used to the
text interfaces) to set individual options that I know I care about.
Then, in this blueprint, I use the config
script to set those
configuration keys non-interactively.
It’s always best to start with some kind of base configuration, regardless of what starting point makes the most sense for you, because there are thousands of configuration options in total. These fall into a few basic categories:
-
Device drivers, which provide support for all the various types of I/O peripherals that might be part of your system — basically, this is everything other than the CPU per se;
-
Linux kernel features, like support for various types of filesystem, cryptographic algorithms, I/O schedulers, and stuff like that; and
-
Other arbitrary settings that let you specify things like the default computer name or command-line or a string to append to the kernel version.
The exact number of configuration options varies by architecture; for MIPS, there are about 9600 configuration options, while for 64-bit x86 computers there are almost eleven thousand, as of the 5.1 kernel. Regardless of architecture, there’s enough complexity that starting from an empty configuration and enabling everything you need is not very much fun.
The most typical approach is to start with the configuration of the
kernel that’s currently running, which is often found in /boot
and,
even when not, can often be found in the /proc
pseudo-filesystem as
/proc/config.gz
.
In this stage of CBL, though, the kernel on the host system is totally
irrelevant for the scaffolding kernel we’re building — it’s not even
for the correct machine architecture! So we’re going to start with one
of the default config files. You can look in
arch/arm64/config
to see all the default kernel config
files you can use — or, just as easily, you can run make
ARCH=arm64 help
to get a summary of the make targets that
are available for the target architecture. In CBL, we use a litbuild
configuration parameter to set the specific platform; the value that’s
currently set for that is defconfig
.
If there aren’t any default configurations that are close to what you
want for the target system, you can create a new one! To do this, just
configure the kernel manually (using make nconfig
or whatever you
prefer); when you’ve got it set up the way you want it, use make
savedefconfig
to produce a new defconfig
file for your configuration.
To use this new configuration in CBL, move defconfig
to
arch/arm64/configs/your_machine_defconfig
and generate a
patch that adds just that file to the kernel source tree. Now you can
add that patch to the Linux blueprint, so it gets applied automatically
from the litbuild-produced script, and use make your_machine_defconfig
to configure the kernel.
The standard CBL blueprint for Linux has an example of that whole process, in the linux (aws-ami-bootable phase) phase.
make mrproper
make ARCH=arm64 defconfig
Starting from the default configuration, we now set a bunch of options that we definitely want for our minimal scaffolding environment. Notice that at this point we’re just enabling options to be built directly into the kernel, rather than setting any features to be built as modules — once we get to the final kernel build, the stuff that we might or might not want to enable at runtime will be built as modules. In CBL, though, we’re always going to build everything necessary to get userspace started directly into the kernel, instead of building an entirely-modular kernel like most of the binary distributions do. That means we won’t have to set up an initramfs to provide an early userspace, which simplifies the boot process substantially.
This technique, of manually setting configure options that we want, is
not really as robust as I’d like it to be. In many cases, config options
are dependent on other config options — for example, you can’t select
POSIX access control lists for the JFFS2 filesystem unless you’ve
enabled the JFFS2 filesystem. Ideally, manually enabling an option like
JFFS2_FS_POSIX_ACL
would cause any other required options, like
JFFS2_FS
, to be enabled as well. But, in fact, what happens is that
any manually-configured options with unmet dependencies simply get
disabled again automatically.
So the process I follow to obtain the set of options that are enabled
here is: First, I run make ARCH=arm64 nconfig
and
manually set all the options that I care about. Then I compare the new
version of .config
to the one I started with; for all the modified
options, I set up the calls to scripts/config
below.
CUATION: The STACK_VALIDATION
option is worth special note. If it’s
set, the kernel build process tries to compile a program called
objtool
that is used to analyze generated object files during the
kernel build itself. However, it tries to use the native compiler with
target system include files. That can’t possibly work. The RETPOLINE
option automatically reselects STACK_VALIDATION
, so we have to turn
that off as well. In the final target kernel, both of those will be
enabled because they are potentially important for the security of the
system, but at this stage it really does not matter.
./scripts/config --enable HIGHMEM
./scripts/config --set-str LOCALVERSION cbl1
./scripts/config --enable DEVTMPFS
./scripts/config --disable DEVTMPFS_MOUNT
./scripts/config --enable EXT4_FS
./scripts/config --enable IKCONFIG
./scripts/config --enable IKCONFIG_PROC
./scripts/config --disable STACK_VALIDATION
./scripts/config --disable RETPOLINE
./scripts/config --disable UNWINDER_ORC
./scripts/config --enable UNWINDER_FRAME_POINTER
./scripts/config --disable STACKPROTECTOR
For AMD64 (aka X86_64) builds, we want to support the x32 ABI. This is an architecture-specific option; it will be ignored for non-AMD64 builds.
./scripts/config --enable X86_X32
The options we’re setting might cause other options to become available.
For example, when HIGHMEM
is enabled, DEBUG_HIGHMEM
becomes a valid
option. Since litbuild scripts are supposed to be entirely automated and
non-interactive, we need to do something that prevents the kernel
configuration machinery from asking questions about any such options.
Luckily, there’s a configuration target that starts with the existing
configuration and then uses the default settings for any newly-available
symbols. That’s exactly what we need.
make ARCH=arm64 olddefconfig
GCC 8 adds a bunch of compiler warnings about aliasing between functions
with possibly-incompatible types. In some architectures, such as mipsel,
this triggers warnings in a bunch of SYSCALL_DEFINE
macros. By
default, the kernel build aborts when it sees these warnings, but we can
avoid this by tweaking the kernel build process to disable that specific
compiler warning.
echo "KBUILD_CFLAGS += -Wno-error=attribute-alias" >> Makefile
For some reason, the STACKPROTECTOR setting from earlier can get lost at
some point during the configuration process. We don’t want to be asked
about that again when we run make all
, so we disable it again here.
./scripts/config --disable STACKPROTECTOR
Similarly, a couple of additional options can magically appear at this point in the build. I haven’t spent a lot of time trying to figure out why that happens; it’s easy enough simply to disable them here.
./scripts/config --disable EFI_STUB
./scripts/config --disable KCSAN
For ARM-architecture builds, there are a few additional settings that we’ll be prompted about during the build unless we specify them here. I really don’t understand why, but again, manually configuring these is not hard.
./scripts/config --enable ARM64_PTR_AUTH
./scripts/config --enable ARM64_BTI
./scripts/config --enable ARM64_BTI_KERNEL
./scripts/config --enable ARM64_E0PD
./scripts/config --enable ARM64_TLB_RANGE
./scripts/config --enable ARCH_RANDOM
./scripts/config --enable ARM64_MTE
Now we can build the kernel and modules. Until Linux 4.18, it was
possible to set CROSS_COMPILE
as a configuration setting in .config
,
but that doesn’t work any more so we have to put it on the make
command line or in the environment. (Grumble, grumble.)
make ARCH=arm64 CROSS_COMPILE=aarch64-cbl-linux-gnu- all
make ARCH=arm64 CROSS_COMPILE=aarch64-cbl-linux-gnu- \
Image.gz
(none)
The install
Makefile target for Linux is a little bit weird. For most
(but not all!) target architectures, it winds up running
boot/install.sh
from the relevant architecture directory. That
install.sh
script looks to see if there is anything executable called
installkernel
in the current user’s $HOME/bin
directory or in the
system /sbin
directory. If there is, it just exec’s into that
installkernel
script or program. Otherwise, it installs the kernel
image and System.map
file itself. (The kernel is really the only thing
you need. System.map
is a symbol table that specifies the address in
memory for every variable and function name contained within the kernel;
it’s useful when debugging kernel panics and "oopses.")
That installkernel
scheme doesn’t work very well for the scaffolding
kernel, because if the host system provides an /sbin/installkernel
script, it’s very likely to do something distribution-specific that
won’t work for the cross-compiled scaffolding kernel.
In CBL, we avoid all of this confusion and complexity by bypassing the normal installation target entirely; we just copy things where we want them.
mkdir -p /home/lbl/work/sysroot/scaffolding/boot
make ARCH=arm64 \
INSTALL_PATH=/home/lbl/work/sysroot/scaffolding/boot \
INSTALL_MOD_PATH=/home/lbl/work/sysroot/scaffolding modules_install
export KERNELPATH=$(find . -name Image.gz -a -type f)
cp -v $KERNELPATH /home/lbl/work/sysroot/scaffolding/boot/kernel
cp -v .config /home/lbl/work/sysroot/scaffolding/boot/config
cp -v System.map /home/lbl/work/sysroot/scaffolding/boot
unset KERNELPATH
10.30. make
Name |
GNU make |
---|---|
Version |
4.3 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.30.1. Overview
GNU make is a build automation program. To use make, you set up one or
more configuration files — the primary one being called, by convention,
Makefile
— that declare recipes for producing intermediate and final
program artifacts, and dependencies between artifacts, and the various
targets that can be produced. At runtime, make
looks to see what
artifacts exist already and which source files have a timestamp
indicating that they’ve changed after dependent artifacts were produced,
figures out based on that analysis exactly which artifacts need to be
rebuilt, and then executes the recipes that will produce those
artifacts.
That’s all pretty awesome!
The downside is that Makefiles are not particularly clear or readable, and for large projects they get pretty big and complicated. That’s why there is a thing called "litbuild"!
But the vast majority of system components use make
to automate their
build process, so you really have to have it available regardless.
10.30.2. make (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.31. ncurses
Name |
GNU new curses library |
---|---|
Version |
6.2 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
10.31.1. Overview
Ncurses ("new curses") is a library that provides terminal control features so programs can provide advanced text-based user interfaces. It’s a free-software version of a similar library (just called "curses") that was developed at Berkeley.
There are lots of programs, even just in the scaffolding we’re setting up, that use ncurses. bash and vim are among them.
Patches are available in a version subdirectory of the overall package
download location — that is, patches for ncurses 6.2 are in the "6.2"
subdirectory of the ncurses package directory. Patches need to be
applied in order; there are instructions in the README file there, but
the basic idea is that you find the latest patch bundle, called
something like ncurses-6.1-20181020-patch.sh
, and run that as a shell
script; then you apply all the patch files later than that in sequence.
I’ve compiled all of the patches for ncurses 6.2 up through the 20201219
patch into a branch-update patch, available in the freesa file
repository.
-
ncurses-6.2-branch-updates-20201219.patch
10.31.2. ncurses (host-scaffolding-components phase)
./configure --prefix=/scaffolding --with-shared \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu --enable-overwrite \
--without-debug --without-ada --with-build-cc=gcc --disable-stripping
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.32. patch
Name |
GNU patch |
---|---|
Version |
2.7.6 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.32.1. Overview
Remember diffutils? The most important program included in diffutils is
diff
, which finds differences between two files or directory trees.
Patch is the conceptual inverse of diff
— it takes a file with all
the differences found by diff
as input (which is by convention called
a "patch file"), and applies all of those differences as changes to
files in a directory tree.
This is really handy. For example, the CBL repository contains a bunch of source tarfiles that were obtained directly from the project web sites; and it also contains patch files that can be applied to those source tarfiles to apply changes that we have found to be necessary or important when building a CBL system.
10.32.2. patch (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.33. pth
Name |
GNU portable threads library |
---|---|
Version |
2.0.7 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
10.33.1. Overview
Pth is a portable threads library; it provides cooperative priority-based scheduling for multiple threads within a program.
The build system for pth doesn’t work well when parallelized, so we
explicitly disable parallel make
jobs by specifying -j1
.
On some servers, the version of config.guess distributed with pth is too old to recognize the target triplet. It’s easy enough to update it to the version distributed with GCC.
-
pth-2.0.7-update-config-guess-1.patch
In some cross-compiled scenarios, pth fails to compile because it considers versions of the Linux kernel later than 2.9 to be "braindead." I got a patch from https://bugzilla.redhat.com/attachment.cgi?id=591825 that corrects the problem.
-
pth-2.0.7-linux-kernel-fix-1.patch
10.33.2. pth (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make -j1
(none)
make -j1 DESTDIR=/home/lbl/work/sysroot install
10.34. ruby
Name |
Ruby |
---|---|
Version |
3.0.2 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
10.34.1. Overview
Ruby is a programming language. There are several implementations of that language — rubinius and jruby are others — but the canonical one is the C implementation that was created by Yukihiro "Matz" Matsumoto, and that’s also the best one for our purposes in CBL because it has the fewest upstream dependencies.
CBL includes Ruby as a core component because CBL itself is designed to be used with the litbuild program — CBL consists entirely of litbuild blueprints — and litbuild is written in Ruby.
On 64-bit ARM systems, the Ruby build process exposes an issue with GCC
(documented at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84521
)
when compiled with -fomit-frame-pointer
and an optimization setting of
-O1
or -O2
. Since the default behavior for modern versions of GCC is
to omit frame pointers, it’s important on such systems to specify
-fno-omit-frame-pointer
when building Ruby. It’s possible that other
architectures have a simliar issue; if your ruby build crashes with a
"stack smashing detected" error message, try adding that option to your
builds as well. (The default configuration for CBL specifies
-fno-omit-frame-pointer
in the target-system CFLAGS
.)
10.34.2. ruby (host-scaffolding-components phase)
Once we boot into the target system, we’ll use litbuild to generate all the scripts that will build the final CBL components. That means we’ll need ruby available!
As with the file package, ruby can have issues in cross-compilation scenarios unless there is a native ruby installation of the same version. This doesn’t always happen, but the installation of a scaffolding ruby 2.7.0 failed when the system ruby was version 2.6.5; if something similar goes wrong for you, a version mismatch is a good thing to check.
- Dependencies
The version of rubygems (about which see more below) distributed with
ruby 2.7 really wants to load the ruby openssl
library, which requires
OpenSSL (or a fork of it, like LibreSSL).
Ruby includes a module called fiddle
, which wraps the libffi
library
and allows ruby programs to call functions written in C or other
languages. In some rare cases — the only one I’ve come across is when
building ruby with the x32 ABI — the libffi
build fails when it tries
to assemble src/x86/win32.S with a cross-toolchain. That file probably
allows libffi to call functions in Windows DDLs, or something; I am
pretty sure it’s not something we’ll need to use. The sed command here
removes it from the build process entirely.
sed -i -e 's@src/x86/win32.S@@' -e 's@src/x86/win32.lo@@' \
ext/fiddle/libffi-*/Makefile.am
sed -i -e 's@src/x86/win32.S@@' -e 's@src/x86/win32.lo@@' \
ext/fiddle/libffi-*/Makefile.in
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.35. sed
Name |
GNU Stream Editor |
---|---|
Version |
4.8 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.35.1. Overview
Sed is kind of like a text editor, but instead of allowing you to modify
text files interactively (like ed and vim do), sed acts as a filter: it
takes a text file as an input stream, makes changes to that input
stream, and produces the resulting modified text as an output stream.
(The name, sed
, means "Stream EDitor").
This is a fairly powerful and flexible thing to be able to do, but it’s
not something people often need to do — like gawk
, sed
is most
often handy in the context of a bash script or something like that,
where a more powerful general-purpose language like ruby or python or
perl isn’t convenient for whatever reason.
On the other hand, the automated build processes for some of the CBL components use sed, so you have to have it around anyway.
10.35.2. sed (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.36. sysvinit
Name |
System V-style init programs |
---|---|
Version |
2.99 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.36.1. Overview
We’ve said this before, but it’s really important so it bears repeating:
the job of the Linux kernel is to mount the root filesystem and then run
a single program, conventionally located at /sbin/init
, with process
ID (PID) 1. It’s the responsibility of the init
program to launch all
the other userspace programs.
The sysvinit package provides an init
program (along with other
related programs), in the style of the init
program that was used by
UNIX System V — hence the name, "SysVInit". It was used by the vast
majority of GNU/Linux systems for many years, although most modern
distributions have switched to other init programs.
In the minimal "scaffolding" userspace, we don’t need anything very
sophisticated for the init
program. So we use sysvinit to manage the
scaffolding userspace: it’s easy to cross-compile and set up, and it’s
relatively simple to understand how it works.
This is the one of the few packages in CBL that is set up only in the scaffolding, and will not also be a part of the final system.
10.36.2. sysvinit (host-scaffolding-components phase)
There are a few hard-coded paths we should adjust: the location of the
init
program itself, the location of the master configuration file
(inittab
), a script that init
will use when running processes, and
the shell that will be used to run that script.
sed -i -e '/define INITTAB/s@/etc/inittab@/scaffolding/etc/inittab@' \
-e '/define INIT/s@/sbin/init@/scaffolding/sbin/init@' \
-e '/define SHELL/s@/bin/sh@/scaffolding/bin/bash@' \
-e '/define INITSCRIPT/s@/etc/initscript@/scaffolding/etc/initscript@' \
src/paths.h
make -C src clobber
The Makefile for this package is smart enough to add the linker flag
-lcrypt
if the C library it’s linking against is GNU libc… but the
test for this is whether there is a file at /usr/lib*/libcrypt.a
,
which is not the case for all systems, and is a meaningless test for
cross-compilation scenarios like this one. Since we are using GNU
libc, we can just force the test to return true.
sed -i -e '/wildcard.*libcrypt.a/s@.*@ifeq (yes,yes)@' src/Makefile
make -C src CC="$CC"
(none)
make -C src ROOT=/home/lbl/work/sysroot/scaffolding install
10.37. tar
Name |
GNU tar |
---|---|
Version |
1.34 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
10.37.1. Overview
Tar is a program for manipulating archives of files. The name derives from its original purpose, back in the dawn of time, which was to manipulate archives on magnetic tape — ergo, "tar," for Tape ARchive. Actually storing archives on magnetic tape is pretty rare these days, but people still use tar-format files for all kinds of things.
These files are known as "tarfiles," for obvious reasons, or as "tarballs" for less obvious reasons. A speculation I found on Wikipedia is that the term references the tendency of actual tarballs (blobs of petroleum floating in the ocean) to get all sorts of things stuck to them; apparently, someone thought that was reminiscent of how the tar program collects a bunch of files together.
All of the source code distribution packages for CBL packages are compressed tarfiles, so the tar program is really important for CBL!.
The tar package also provides a program called "rmt" that lets you manipulate magnetic tape drives attached to other computers. This is an even more fringe thing to do in this day and age, but It only weighs in at about 50kb, so it’s not hurting anything.
10.37.2. tar (host-scaffolding-components phase)
Several of the tests run by the configure script don’t work right when tar is being cross-compiled. As with some other programs that use the GNU build system, we can short-circuit those tests by pretending that the configure script was run previously and knows the results from those tests.
echo "gl_cv_func_wcwidth_works=yes" > config.cache
echo "gl_cv_func_btowc_eof=yes" >> config.cache
echo "ac_cv_func_malloc_0_nonnull=yes" >> config.cache
echo "gl_cv_func_mbrtowc_incomplete_state=yes" >> config.cache
echo "gl_cv_func_mbrtowc_nul_retval=yes" >> config.cache
echo "gl_cv_func_mbrtowc_null_arg1=yes" >> config.cache
echo "gl_cv_func_mbrtowc_null_arg2=yes" >> config.cache
echo "gl_cv_func_mbrtowc_retval=yes" >> config.cache
echo "gl_cv_func_wcrtomb_retval=yes" >> config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--cache-file=config.cache
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
10.38. texinfo
Name |
GNU texinfo |
---|---|
Version |
6.8 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
10.38.1. Overview
Texinfo is the official documentation system for the GNU project. It uses input files in a standard format to produce a variety of outputs, both printed and online.
Texinfo is used in the automated build process of many of the CBL component projects.
-
texinfo-6.8-update-gnulib-1.patch
When compiled against glibc 2.34, texinfo 6.8 runs into a problem due to an obsolete version of gnulib. We can patch gnulib to work around this issue.
10.38.2. texinfo (host-scaffolding-components phase)
When cross-compiling, the configure script makes some incorrect guesses
about the availability of functions provided by glibc. We can override
those guesses in a config.cache
file.
Even after doing that, the presence of the gnulib
version of
getopt.h
causes the build to break. That’s easy enough to work around
simply by removing it.
echo "gl_cv_func_getopt_gnu=yes" > config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--cache-file=config.cache
rm -f gnulib/lib/getopt.h
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
sed -i -e 's@/usr/bin/perl@/scaffolding/bin/perl@' \
/home/lbl/work/sysroot/scaffolding/bin/makeinfo
sed -i -e 's@/usr/bin/perl@/scaffolding/bin/perl@' \
/home/lbl/work/sysroot/scaffolding/bin/texi2any
The build process creates two scripts that expect perl to be in the location where it’s found on the host system, rather than the place it will be available in the initial target system, so we fix them up after installation.
10.39. vim
Name |
Vim |
---|---|
Version |
8.2.3354 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
10.39.1. Overview
Back in the early days of Unix, someone wrote a text editor called "ed" (which stood for, get this, "EDitor"). Ed is a line editor, which means that generally it lets you modify a line at a time. Ed is still available, and in fact still actively developed — GNU ed 1.14.1 was released in January of 2017 — but most people want a more interactive editor, and so one of the old proprietary Unixes eventually included an editor called "vi" ("VIsual editor") to fill this need.
Since vi is proprietary, various people implemented similar programs, which they released under open source licenses of various sorts. One of the people who decided to do this is Bram Moolenar, who created Vim; its name stands for "Vi IMproved," because it’s massively superior to vi.
Vim is distributed under a different software license than most of the packages that make up CBL: the Vim License. This license is GPL-compatible, but is described as "charityware" — Vim’s author urges users to make a donation for needy children in Uganda.
As far as that goes, you probably don’t need to install vim at all. There are plenty of other text editors that some people prefer. The default editor in some distributions is nano, and some people like their text editor to be their primary tool for interacting with the computer and use emacs. Whatever your preferences, though, you absolutely need a text editor of some sort, and vim is our favorite, so that’s the one that winds up in the basic CBL blueprints.
10.39.2. vim (host-scaffolding-components phase)
As with many scaffolding pieces, the configure script for vim doesn’t play nicely with cross-compilation, so we pre-fill a config.cache with a bunch of values that it won’t be able to discover on its own.
An interesting difference between vim and most other packages we’re
building is that it automatically picks up entries from a config.cache
file located in the src/auto
directory, and apparently doesn’t support
providing a reference to config.cache in the top-level configure run.
mkdir -p src/auto
echo "vim_cv_getcwd_broken=no" > src/auto/config.cache
echo "vim_cv_memmove_handles_overlap=yes" >> src/auto/config.cache
echo "vim_cv_stat_ignores_slash=no" >> src/auto/config.cache
echo "vim_cv_terminfo=yes" >> src/auto/config.cache
echo "vim_cv_toupper_broken=yes" >> src/auto/config.cache
echo "vim_cv_tty_group=world" >> src/auto/config.cache
echo "vim_cv_tgetent=zero" >> src/auto/config.cache
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu \
--enable-gui=no --disable-gtktest --disable-xim --disable-gpm \
--without-x --disable-netbeans --with-tlib=ncurses
make
(none)
There are a lot of programs that expect a text editor called "vi" to exist, so we’ll create a symlink after installing vim.
make DESTDIR=/home/lbl/work/sysroot install
ln -sv vim /home/lbl/work/sysroot/scaffolding/bin/vi
10.40. xz
Name |
XZ Utils LZMA compression program |
---|---|
Version |
5.2.5 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
10.40.1. Overview
The XZ Utils package provides a compression/decompression program with an interface similar to gzip, but using the LZMA algorithm. This is very slow when compressing, very fast when decompressing, and compresses more effectively than most of the other common compression programs, so it’s used fairly often when providing archive files for download.
The compression algorithm used by the XZ Utils is the same as that used
by lzip
, but the file format is more complex and the compressed output
it produces is usually slightly larger than that produced by lzip
.
10.40.2. xz (host-scaffolding-components phase)
./configure --prefix=/scaffolding \
--build=x86_64-unknown-linux-gnu --host=aarch64-cbl-linux-gnu
make
(none)
make DESTDIR=/home/lbl/work/sysroot install
Having constructed all the scaffolding we can build on the host system, it’s time to set up for the second (target) stage of the build. The target-side build will be run automatically when the target system is launched.
11. Finish preparing the scaffolding for launch of the target system
Now that everything’s built, we need to adjust some files that were incorrectly written.
11.1. Fix Scaffolding Libtool Wrapper Files
Libtool is a part of the GNU Build System, and is therefore used in the
build machinery of many CBL components. One thing libtool does is create
"wrapper libraries" with the .la
extension; later invocations of
libtool use those files to find the real location of library files.
The full host sysroot path winds up in many of the .la
files in the
scaffolding lib directory. This can cause problems when building the
final CBL system (after booting into the target system), since those
paths don’t exist any more at that point. So it’s a good idea to fix
those paths after the scaffolding is otherwise complete.
cd /home/lbl/work/sysroot/scaffolding/lib
grep -l '/[a-zA-Z/]*sysroot/scaffolding/' *.la | while read FILE; \
do \
sed -i -e 's@/[a-zA-Z/]*sysroot/scaffolding/@/scaffolding/@g' $FILE; \
done
Other libtool files wind up with cross-toolchain paths embedded in them. That won’t work any better than the sysroot path does.
grep -l '/home/lbl/work/crosstools/aarch64-cbl-linux-gnu' *.la | while read FILE; \
do \
sed -i -e 's@/home/lbl/work/crosstools/aarch64-cbl-linux-gnu@/scaffolding@g' $FILE; \
done
And still other libtool files wind up with some path under the build
directory. That’s even worse, since in some cases the libraries won’t
even exist in the target system! But if they do exist, they’ll be in
/scaffolding/lib
.
grep -l -- '-L/home/lbl/work/build' *.la | while read FILE; \
do \
sed -i -e 's@-L/home/lbl/work/build/[^ ]* @-L/scaffolding/lib @g' $FILE; \
done
We also need to copy all the source code and patches into the scaffolding so that we’ll have it in the target system, and set up a litbuild configuration file there.
11.2. Final Preparation of the Scaffolding
All of the programs and libraries needed to boot into the minimal target
system are built and installed! But there are a few more things we’ll
need in the /scaffolding
directory so we can finish the build.
For one thing, we’re going to need the CBL blueprints! We could pre-create all the litbuild scripts that will be used in the target side of the CBL process, but it seems tidier and more elegant to use litbuild from within the target system. Also, while developing and debugging the target-side CBL process, it’s handy to be able to modify the blueprints and re-generate scripts easily within the target environment.
cp -a $LB_BLUEPRINT_DIR /home/lbl/work/sysroot/scaffolding/cbl
We also need the source code and patches that will be used during the rest of the build process. We need to have that stuff available locally, rather than pulling it in from a network location, because the minimal scaffolding userspace has no way to access a network.
We might not need all the sources from the repository that CBL used for the host-system part of the build — for example, we don’t really need QEMU for the remainder of the build process, and we don’t use the System V init program at all — but picking and choosing just the pieces we need would add complexity without really adding any value: a few hundred megabytes of storage is no big deal in this day and age.
mkdir -p /home/lbl/work/sysroot/scaffolding/materials
find /home/lbl/materials/ -type f -exec cp -n {} \
/home/lbl/work/sysroot/scaffolding/materials \;
if [ /home/lbl/materials != /home/lbl/materials ]; \
then \
find /home/lbl/materials/ -type f -exec cp -n {} \
/home/lbl/work/sysroot/scaffolding/materials \;; \
fi
Since litbuild will be installed before anything else, we won’t have lzip available at the time. The simplest thing to do is simply uncompress it now.
lzip -d /home/lbl/work/sysroot/scaffolding/materials/litbuild*tar.lz
And we need to configure the init program for the target system, and write the scripts that it will run at boot time.
11.3. Build and Install Entropy Adder
11.3.1. Overview
As mentioned elsewhere, you
can add data to the entropy pool without any difficulty from any
userspace process: all you have to do is write data to /dev/urandom
.
However, that doesn’t help to initialize the entropy pool, because it
doesn’t increase the kernel’s estimate of how much entropy is actually
available. To do that, we have to execute a system call that is only
available to processes running as the superuser.
The system call in question is ioctl
, which is kind of a catch-all
that exposes arbitrary functionality in device drivers. ioctl
itself
is short for "Input-Output Control." The idea is: since there is a huge
variety of input/output devices that might be available to your system,
and there’s no way for Linux to provide specific system calls to
exercise every distinct function of every one of those devices, any
device driver can define a set of ioctl
operations. Userspace
applications can then use the ioctl
system call to invoke any of those
operations, using a file descriptor that references one of the device’s
special files under /dev
.
The device driver for the /dev/random
and /dev/urandom
special files
happens to provide an ioctl
that lets you add data to the entropy
pool. We’re going to write a tiny C program that feeds some data to the
kernel using that ioctl
, so that the entropy pool will be fully
initialized immediately.
Note that it is an extremely bad idea to use this program unless you are confident that the input data really is unpredictable, or unless (as in the case of the target-side CBL build) you are equally confident that there is no reason to be concerned about the quality of random data available to userspace processes.
In many cases, this program isn’t needed in a full working system — the
rngd
daemon from the rng-tools package feeds the kernel entropy from
hardware random number generators, like the one built into modern
x86-architecture CPUs. But CBL supports multiple computer architectures,
and some of those don’t have anything suitable for rngd, so addentropy
can be helpful in those cases.
The addentropy
program is derived from ec2seed
, at
https://github.com/akkornel/ec2seed, which is available under the terms
of the GNU GPL Version 3. Accordingly, there’s no issue with
distributing the addentropy source as part of CBL; all the code within
CBL is similarly licensed under the GNU GPL Version 3.
As with almost all C programs, we start by including header files.
#include <errno.h>
#include <fcntl.h>
#include <linux/random.h>
#include <linux/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
We define a constant for the number of bytes we’re going to feed to the entropy pool. That’s really not all that important, but it makes some later parts of the program slightly clearer.
const int RANDOM_BYTE_COUNT = 1024;
Before and after adding data to the entropy pool, we can obtain the
kernel’s approximation of how much entropy it already contains, using
another ioctl
provided by the /dev/random
device driver. Let’s
define a function for that.
int entropy_available (const int fd) {
int num_bytes;
if (ioctl(fd, RNDGETENTCNT, &num_bytes) == -1) {
fprintf(stderr, "Unable to determine amount of entropy in the pool\n");
}
return num_bytes;
}
Now we can write the main routine for addentropy. As usual in C programs, we start by defining all of the variables we’ll use.
int main (int argc, char *argv[])
{
__u32 *ebuf;
int remaining, before, after, bytes_added, urandom_fd;
ssize_t bytes_read;
struct rand_pool_info *entropy;
We need to open a file descriptor on the /dev/urandom
special file so
that we can invoke ioctl
operations on it. And we need to allocate a
buffer that we can read the data in to.
urandom_fd = open("/dev/urandom", O_WRONLY);
if (urandom_fd == -1) {
fprintf(stderr, "Error opening random file\n");
return 1;
}
entropy = malloc(sizeof(struct rand_pool_info) + RANDOM_BYTE_COUNT);
if (entropy == NULL) {
fprintf(stderr, "Unable to allocate memory for the entropy buffer\n");
return 1;
}
According to random.c
, which is the source code for the random-number
device driver, the entropy_count
field of rand_pool_info
structures
is denominated in units that are an eighth of a bit each. I have no idea
why. So maybe we should multiply RANDOM_BYTE_COUNT
by 64 rather than
by 8 here? But this appears to work perfectly well, so I’m not messing
with it.
entropy->entropy_count = RANDOM_BYTE_COUNT * 8;
entropy->buf_size = RANDOM_BYTE_COUNT;
ebuf = entropy->buf;
Now we need to obtain some data (which we will presume to be random).
The way we’re going to do that is simply read it from the standard input
stream. Since we’re using some low-level file manipulation functions, we
have to deal with the possibility that any given call to read
actually
produces less data than we request; therefore, we set up a loop and
read
repeatedly until we have all the data we want.
remaining = RANDOM_BYTE_COUNT;
do {
bytes_read = read(STDIN_FILENO, ebuf, remaining);
if (bytes_read < 0) {
fprintf(stderr,
"Error occurred while reading stdin: %s\n",
strerror(errno));
return 1;
} else if (bytes_read > 0) {
ebuf += bytes_read;
remaining -= bytes_read;
}
} while (bytes_read > 0 && remaining > 0);
Now we can add the data to the entropy pool, using the RNDADDENTROPY
ioctl. We’ll also print out a description of what happened.
before = entropy_available(urandom_fd);
if (ioctl(urandom_fd, RNDADDENTROPY, entropy) == -1) {
fprintf(stderr, "Error adding entropy to kernel\n");
return 1;
}
after = entropy_available(urandom_fd);
bytes_added = RANDOM_BYTE_COUNT - remaining;
printf("Added %i bytes to the entropy pool.\n", bytes_added);
printf("Entropy available: %i -> %i\n", before, after);
There’s nothing left to do other than clean up the resources we’ve allocated and terminate the program. Technically we don’t have to clean up the resources — Linux will do that automatically as the process is reaped — but it’s good practice to do it.
free(entropy);
close(urandom_fd);
return 0;
}
That’s it!
The command we use to compile the program, and the location where it will wind up, depends on whether we’re building this in the scaffolding or for the final target system.
11.3.2. Build and Install Entropy Adder (scaffolding phase)
For the scaffolding, we need to use the cross-compiler to compile
addentropy
, and it needs to wind up in the scaffolding bin directory.
aarch64-cbl-linux-gnu-gcc -std=c89 -Wall -Wextra -Wpedantic -Werror \
-Wno-unused-parameter -g -O2 \
-o /home/lbl/work/sysroot/scaffolding/bin/addentropy /tmp/addentropy.c
11.3.3. Complete text of files
11.3.3.1. /tmp/addentropy.c
#include <errno.h>
#include <fcntl.h>
#include <linux/random.h>
#include <linux/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
const int RANDOM_BYTE_COUNT = 1024;
int entropy_available (const int fd) {
int num_bytes;
if (ioctl(fd, RNDGETENTCNT, &num_bytes) == -1) {
fprintf(stderr, "Unable to determine amount of entropy in the pool\n");
}
return num_bytes;
}
int main (int argc, char *argv[])
{
__u32 *ebuf;
int remaining, before, after, bytes_added, urandom_fd;
ssize_t bytes_read;
struct rand_pool_info *entropy;
urandom_fd = open("/dev/urandom", O_WRONLY);
if (urandom_fd == -1) {
fprintf(stderr, "Error opening random file\n");
return 1;
}
entropy = malloc(sizeof(struct rand_pool_info) + RANDOM_BYTE_COUNT);
if (entropy == NULL) {
fprintf(stderr, "Unable to allocate memory for the entropy buffer\n");
return 1;
}
entropy->entropy_count = RANDOM_BYTE_COUNT * 8;
entropy->buf_size = RANDOM_BYTE_COUNT;
ebuf = entropy->buf;
remaining = RANDOM_BYTE_COUNT;
do {
bytes_read = read(STDIN_FILENO, ebuf, remaining);
if (bytes_read < 0) {
fprintf(stderr,
"Error occurred while reading stdin: %s\n",
strerror(errno));
return 1;
} else if (bytes_read > 0) {
ebuf += bytes_read;
remaining -= bytes_read;
}
} while (bytes_read > 0 && remaining > 0);
before = entropy_available(urandom_fd);
if (ioctl(urandom_fd, RNDADDENTROPY, entropy) == -1) {
fprintf(stderr, "Error adding entropy to kernel\n");
return 1;
}
after = entropy_available(urandom_fd);
bytes_added = RANDOM_BYTE_COUNT - remaining;
printf("Added %i bytes to the entropy pool.\n", bytes_added);
printf("Entropy available: %i -> %i\n", before, after);
free(entropy);
close(urandom_fd);
return 0;
}
11.4. Write the Scaffolding Init Scripts
As mentioned elsewhere (and often), the job of the Linux kernel is to
initialize hardware, mount the root filesystem, and then run an init
program (with PID 1) to start userspace. The init
program does all the
rest of the system startup process — mounting filesystems, spawning
long-running daemon processes, perhaps starting a GUI environment.
There are a couple of other things to keep in mind about the process running with PID 1:
-
First, it’s not allowed to terminate. If it does, the Linux kernel will immediately panic and crash. So it’s important for the init program to be extremely resilient.
-
Second… this requires a bit of additional explanation. Whenever a process terminates, it becomes what is called a "zombie" process; the process that launched it, its "parent" process, is supposed to "reap"[6] it by collecting its exit code, whereupon it is removed from the process table. If that doesn’t happen, the zombie process stays around indefinitely, cluttering up the process table and wandering around looking for brains to eat.[7] If the parent process terminates before the child process, the child becomes an "orphaned" process — it has no parent — and when it terminates, there’s nothing to collect its exit status. That’s where
init
comes in: any time a process is orphaned, the kernel sets its parent to PID 1. In addition to launching userspace processes, it’s the responsibility of PID 1 to reap all these adopted processes when they terminate.
In CBL, we want the init
program in the minimal target userspace to
execute the target-side build process, which will result in a complete
and functional (albeit bare-bones) GNU/Linux system. That means that it
doesn’t have to launch any interactive programs or processes. In
principle, that means it doesn’t even have to be a traditional init
program. The initial approach we took in CBL took advantage of that to
use a bash
shell script as the init
"program." As you can see, that
plan was sheer elegance in its simplicity! You can read a shell script
and see exactly what it’s doing, step by step; if you were to type the
commands from the script in an interactive shell session, it would do
exactly the same thing that the script does.
Unfortunately, though, that extremely bare-bones approach makes it
difficult to figure out what’s going on when things go wrong. For
example, the test suite for GNU libc sometimes spawns processes that
never terminate, and until we figured out what was going on and found a
workaround, this caused the target-side build to hang indefinitely when
the build process completed, rather than terminating gracefully. Having
a full init
process around lets us do things like spawn additional
terminal processes so there are shells on multiple virtual terminals,
which can be really helpful when trying to figure out what’s going on in
situations like that.
So we use a simple init program, sysvinit, as the init
program for
the initial target system: it’s not a great init
, but it’s reliable
enough that it was the common choice for GNU/Linux systems for decades,
and it’s really easy to cross-compile. The latter aspect is the dominant
concern for our purposes; the init system used by the final CBL system
is based on s6, which is ideal in many respsects but is not
straightforward to cross-compile.
11.4.1. The inittab file
The sysvinit program runs commands as directed by a file called
inittab
. It’s documented in a man page provided with the program (man
5 inittab
to read it), so I’m not going to go into a lot of detail
here; but I’ll explain a bit about what’s going on so you don’t have to
look elsewhere. In this section, lines from inittab
will be
interspersed with the scripts that they execute, which will hopefully
result in a clear narrative flow.
Sysvinit allows different sets of programs to be run, in case (for example) you sometimes want networking to be enabled and other times don’t; these are called "runlevels." For CBL we just use runlevel 2, which performs the full target side build. (Runlevels 0, 1, and 6 are reserved for shutdown, single-user mode, and reboot respectively.)
Each line of inittab
has several fields separated by colon characters.
In most cases, an inittab
line defines a command that init
will run
under some circumstance. The first field is always a two-character
identifier; the second field is usually a set of runlevels in which the
command will be run; the third field specifies the way init
should run
the command; and the fourth field (when present) specifies the command
that should be run.
The initdefault
line doesn’t define a command, it just specifies the
default runlevel for the system. In our case, as mentioned previously,
this is 2
.
id:2:initdefault:
11.4.2. The initscript
script
A handy feature provided by sysvinit is the capability to use an
"initscript" to run commands. This is a shell script that is used by
init
to run the commands it finds in inittab, instead of simply
executing them. By setting some environment variables there, we’ll have
them set automatically for all the scripts and interactive shells that
init
runs.
#!/scaffolding/bin/bash
Since the processes spawned by init
will be executed with a minimal
environment, we need to start by setting environment variables like
PATH
— without that, the full path would need to be specified for
every command run in the script or shell. We also set LD_LIBRARY_PATH
,
so all the shared libraries will be found by the program loader.
export PATH=/scaffolding/bin:/scaffolding/sbin
export LD_LIBRARY_PATH=/scaffolding/lib
GNU/Linux systems have extensive support for internationalization and localization. Before we have the full system set up, it’s a good idea to specify a simple default setting for those features.
export LC_ALL=POSIX
As the final system programs and libraries are built, they should be
used in preference to the ones in /scaffolding
, so we’ll put the
directories where they’ll live in front of the /scaffolding
directories.
export PATH=/bin:/usr/bin:/sbin:/usr/sbin:$PATH
export LD_LIBRARY_PATH=/lib:/usr/lib:$LD_LIBRARY_PATH
The TARGET_SYSTEM_MAKEFLAGS
parameter specifies the MAKEFLAGS
that
should be set for all target-side processes. (If a specific package has
issues with parallel builds, it can be overridden to -j1
or unset
entirely for those packages.)
export MAKEFLAGS="-j8"
The bash script automatically sets the HOME
environment variable to be
the current user’s home directory, and there are some programs that
expect HOME
to be set to a reasonable value. We can go ahead and set
it here in case any of the programs we’re using is among those.
export HOME=/root
We also need to set environment variables for all the litbuild
configuration parameters that might not use default parameter values.
It’s not hard to define these properly because from this point onward we
don’t need to worry about HOST
and TARGET
and so on: everything is
just going to be a native build. We do need to pass a number of
parameters from the host-side build along through to the target-side
build, though!
export BOOT_DEVICE=''
export BOOTLOADER='manual'
export HOST_NAME='cbl'
export DOMAIN_NAME='lblinux.org'
export KERNEL_TARGET='Image.gz'
export LOGIN_FULL_NAME='A Little Blue User'
export LOGIN='lbl'
export TARGET_GCC_CONFIG='--enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-cpu=cortex-a72.cortex-a53'
export TARGET_SYSTEM_CFLAGS='-O2 -fno-omit-frame-pointer -march=native'
export TARGET_SYSTEM_MAKEFLAGS='-j8'
export TARGET_SERIAL_DEV='ttyAMA0'
Many of the parameters used in the litbuild blueprints will have static values defined by convention within CBL, so we don’t even need to pass the existing parameters defined for the host-side build along.
export DOCUMENT_DIR=/tmp/build/doc
export LOGFILE_DIR=/tmp/build/logs
export PATCH_DIR=/scaffolding/materials
export SCRIPT_DIR=/tmp/build/scripts
export TARFILE_DIR=/scaffolding/materials
export WORK_SITE=/scaffolding/build
And of course we have to run the program that init
is telling this
script to run. The command line used to run the initscript is given four
arguments, representing the first, second, third, and fourth inittab
fields, respectively. The only one we need to use is the fourth one,
which contains the command line specified in inittab
.
eval exec "$4"
11.4.3. The startup-target-system script
If a bootwait
line is present in inittab
, the process on that line
will be executed once during system boot, and init
will wait to run
any other processes until it completes. We can take advantage of that
feature to run a script that gets the system to a baseline functional
state — kernel filesystems are mounted, the root filesystem is
writable, and litbuild is available to generate scripts from blueprints.
bw::bootwait:/scaffolding/startup-target-system.sh
The script that will be run by the bootwait
line is designed to be
idempotent: that is to say, it can be run any number of times and will
ensure that the system is in the expected state when it is complete, but
won’t make unnecessary changes. This is helpful in cases where the
target system doesn’t complete the build properly for some reason — if
it’s running in a QEMU virtual machine, and QEMU crashes, for example,
it will need to be restarted.
Most of the commands executed by startup-target-system
are persistent
across reboots, so we want to skip those commands if we restart the
build. Accordingly, most of the commands executed by the script are
enclosed in guard clauses that check whether it’s necessary to do
anything; commands that don’t need to be run are skipped.
The script starts, as normal, with a "shebang" line that the kernel will
use to determine what program will actually run it. In our case, that’s
the scaffolding bash
program.
#!/scaffolding/bin/bash
Now we can start the build. The first thing to do is remount the root filesystem so we can write to it.
mount -o remount,rw /
Linux systems have several magical filesystems. Two of these, /sys
and
/proc
, don’t actually contain real files and directories, they just
contain things that look like files and directories, but actually
provide insight into the state of the kernel and system — and in many
cases the ability to manipulate that state — using a file interface.
For example, /proc/cmdline
shows the command line that was passed to
the kernel by the boot loader, and /proc/filesystems
shows the set of
filesystems that are supported by the kernel.
/dev
is different — it’s the canonical location on the system for
nodes, aka "special" files. These represent I/O devices and allow those
devices to be accessed as though they were files. Linux has a feature
called "devtmpfs" that automatically populates a RAM-based filesystem
with nodes for all of the I/O devices it knows about.
Additional directories that should have RAM-based filesystems mounted on
them are /run
and /tmp
. /run
is a canonical location for files
containing volatile system state information — things like the runtime
s6 scan directory — and /tmp
is the conventional location for
temporary files that should not persist across reboots. It’s convenient
to use a tmpfs
for that purpose. This does limit the amount of data
that can be written to /tmp
: by default, a tmpfs
filesystem is
limited to half the physical RAM on the system. For small target
systems, this can be a problematic constraint; if you wind up having any
issues as a result, you can replace the mount
command here with
something like rm -rf /tmp; mkdir -m 1777 /tmp
to ensure that
everything from the previous boot (if any) has been cleared up but
without using a (size-limited) tmpfs
for it.
if [ ! -d /dev ]
then
mkdir -m 0755 -p /dev /proc /run /sys /tmp
fi
mount -n -o nosuid,noexec,nodev -t sysfs sys /sys
mount -n -o nosuid,noexec,nodev -t proc proc /proc
mount -n -o mode=0755,nosuid -t devtmpfs dev /dev
mount -n -o mode=0755,nosuid -t tmpfs none /run
mount -n -o mode=1777,nosuid -t tmpfs none /tmp
Now that /dev
is mounted, we can reference paths within it
meaningfully. The first device we need to use is the console device.
There are a couple of ways to reference terminals in Linux systems:
virtual terminals are associated with numbered tty
character special
files ("tty" stands for "teletype": a historical artifact), like
/dev/tty1
or /dev/tty2
. The first of these, /dev/tty0
, is special
and always refers to the current virtual terminal. Serial ports are
conventionally referred to using similar syntax with an extra "S" afer
"tty": /dev/ttyS0
or /dev/ttyS1
. /dev/tty
is always a reference to
whatever terminal opened the process that is looking at it… with the
proviso that for some reason — possibly because /dev
is not mounted
at the time init
runs? — the scripts started by init
don’t seem to
have an associated tty at all.
The system console, which is where the kernel’s boot messages are
displayed, is always available at /dev/console
. By default this points
at /dev/tty0
, but can be overridden by the boot loader using a
console
command-line directive, e.g. console=/dev/ttyS0
. The console
can also be multiplexed (written to and read from by multiple devices)
by specifying multiple console
directives.
Since we want to see this script’s output, and the script (for whatever
reason) has no associated tty device, we need to redirect the script’s
output to /dev/console
. We also want to redirect input from
/dev/console
to the script, so that once the build process terminates
(successfully or not) and leaves us at a shell prompt we will be able to
type commands interactively.
A handy trick to do this is the exec
shell builtin: when run without a
command but with redirections, exec
simply modifies the shell’s
modifies its file descriptors as instructed.
exec >/dev/console 2>&1 </dev/console
Before that exec
command runs, output from the startup script would
cause the startup script to block until it was able to write the output
somewhere — which would never happen, since init
is patiently waiting
for the bootwait
script to complete. Now that it’s run, we can start
logging what we’re doing.
echo "Console activated for startup-target-system.sh"
11.4.4. Trifling with the entropy pool
There’s a little kludge we’re going to use to avoid long delays before the target-side build starts, but to explain what we’re doing I’m going to have to digress a few steps — you can skip this whole part if you’re not interested in some abstruse technical details.
Still with me? Brave soul. Okay, the Linux kernel has a built-in feature that lets you obtain cryptographically-strong random numbers. This is not something that computers are naturally good at, because computers are highly deterministic — there are plenty of pseudo-random number generation (PRNG) algorithms, but they all have to be seeded with some initial value and if you don’t have a source for that initial seed value that’s really random, you wind up with very predictable pseudo-random numbers. If you’re doing something involving cryptography, that’s no good! When a cryptographic algorithm calls for random bytes — in this context, sometimes this is called "entropy," although I remember hearing at some point that the terms aren’t strictly synonymous — the values of those bytes really need to be unpredictable, or else the strength of the algorithm collapses entirely.
Linux makes up for this by collecting some bits of entropy from
unpredictable events. The most easily-observed of these events are human
inputs, like keystrokes and mouse movements; by measuring the time
between keystrokes and taking the least significant few bits from each
interval, or measuring the interval between receiving network packets or
some other kinds of hardware interrupts or other things like that, Linux
can gather a fair amount of really random data. It mixes these random
bits into an "entropy pool" it maintains. (You can read all about this
in drivers/char/random.c
in the Linux source tree.)
There are two easy ways that userspace processes can request random data
from the entropy pool: they can read it out of the character device
/dev/random
, which only provides as much random data as Linux
estimates to be available and blocks after that’s exhausted, and
/dev/urandom
, which is basically a PRNG that is periodically
re-initialized with a really-random seed value — good enough for most
purposes, and it never blocks. That latter source, /dev/urandom
, is
what the SecureRandom
class in the Ruby standard library uses to
initialize its pseudo-random number generator; it’s also used by lots
of other programs that need random data.
Before Linux 4.16, that was the end of the story: there was
/dev/random
, which you would use if you wanted completely random bytes
for things like cryptographic key generation and one-time pads; and
/dev/urandom
, which was perfect for getting more-or-less random data
without any delays. In the 4.16 release series, though, this behavior
changed slightly in order to address a security vulnerabliity: now, if
any process tries to read from /dev/urandom
before the kernel’s PRNG
has been initialized with a really-random seed value, the read blocks
until that initialization is complete.
This is still fine in most circumstances! But the CBL target-side build is unusual: it’s intended to be completely automated, and it’s not on a network because that would make things more complex than they need to be. So there aren’t any keystrokes or network packets to measure timings from! That means that initializing the kernel’s PRNG can take fifteen minutes or longer after booting the target system. It also turns out that some of the programs used in the target build — for example, the litbuild program used to generate the scripts for that build — wind up reading from one of the random devices, which causes the target-side build to hang — sometimes for quite a while — shortly after it starts.
You can feed entropy to the pool just by writing data to /dev/urandom
(or, I think, /dev/random
), but the kernel doesn’t trust that any data
you write that way is really random, so this doesn’t help unblock the
build. There’s a system call available to privileged processes (that is,
processes that are running as root) that adds random data to the pool
and assures the kernel that it really is random, though, so we can use
a tiny C program to invoke that system call.
- Dependencies
Of course, at this point we don’t have any reliable source of random data to use with that program — that’s the whole problem! But, luckily, we don’t actually need one. We’re not doing anything in the CBL build where the lack of cryptographically-strong random data is a problem. So we’re just going to lie to the kernel: we’ll write some non-random data to the entropy pool and assert that it is random.
Specifically, we’re going to pretend that the program code for the
addentropy
program is random, even though it’s really not at all!
echo "Adding fake randomness to the entropy pool"
addentropy < /scaffolding/bin/addentropy
echo "Fake randomness added"
11.4.5. Basic system directories
We need to create the rest of the basic sytem directory structure. This doesn’t have to happen right now, but this is as good a time as any.
If the build process was interrupted and restarted, the directory
structure and symbolic links will already exist, so we can skip this
stuff. The same kind of logic applies in many later parts of this
script, so there will be additional guard if
statements around those
blocks.
if [ ! -d /bin ]
then
echo "Creating standard system directories"
pushd /
chmod 0775 .
mkdir -m 0755 -p bin boot etc home lib libexec media mnt opt
mkdir -m 0755 -p sbin usr var
mkdir -m 0750 -p root
pushd usr
mkdir -m 0755 -p bin include lib libexec local sbin share src
pushd share
mkdir -m 0755 -p doc info locale man misc terminfo zoneinfo
pushd man
mkdir -m 0755 -p man1 man2 man3 man4 man5 man6 man7 man8
popd # /usr/share/man
popd # /usr/share
pushd local
mkdir -m 0755 -p bin include lib libexec sbin share src
pushd share
mkdir -m 0755 -p doc info locale man misc terminfo zoneinfo
pushd man
mkdir -m 0755 -p man1 man2 man3 man4 man5 man6 man7 man8
popd # /usr/local/share/man
popd # /usr/local/share
popd # /usr/local
popd # /usr
pushd var
mkdir -m 0755 -p cache lib local lock log mail opt spool
mkdir -m 1777 -p tmp
ln -s /run /var/run
popd # /var
popd # /
else
echo "Standard system directories are already present"
fi
Also, all the multilib directories that the GNU toolchain components
sometimes insist on using should be symbolic links to lib
. This is
still a kludge, but it’s the only way to avoid needing to specify some
arbitrary set of lib directories in ld.so.conf
.
If you’re building for an architecture that uses different multilib directory names, you might need to create additional symbolic links here. If you do that, you’ll probably also need to make a corresponding tweak in Create Symbolic Links For Scaffolding Lib Directories; also, look at the specs file when it’s being adjusted to see whether the multilib paths are present in the linking specs. If they are, you may need to make changes there as well.
if [ ! -L /usr/libx32 ]
then
echo "Creating multilib symbolic links"
for DIR in / /usr
do
pushd $DIR
for MULTILIBDIR in lib32 lib64 libx32
do
ln -sv lib ${MULTILIBDIR}
done
popd
done
else
echo "Multilib symbolic links are already present"
fi
Conventionally, information about filesystems that are currently mounted
is available in the file /etc/mtab
, and some packages therefore expect
/etc/mtab
to contain this information. The historical convention is
that the mount
and umount
programs, which attach and detach
filesystems from the filesystem hierarchy, would also add and remove
corresponding lines from the mtab file.
The mtab file isn’t actually needed any more, though, because the
/proc
filesystem contains a file that always contains the kernel’s
view of what filesystems are mounted. We can create a symbolic link to
the conventional location, for the use of any packages that look for it
there.
if [ ! -L /etc/mtab ]
then
echo "Creating mtab symbolic link"
ln -sv /proc/self/mounts /etc/mtab
else
echo "mtab symbolic link is already present"
fi
11.4.6. User and Group database files
Users are defined in the file /etc/passwd
. This file can also
contain hashed versions of users' passwords, as well as other user
account metadata — hence its name — but usually it does not! This is
because passwd
has to be readable by everyone (it’s not really
important why this is the case, so I won’t get into that here) and
shortly after UNIX systems became popular it became clear that it was a
bad idea to allow any user to see even the hashed version of
passwords.[8]
So in most UNIX operating systems, the /etc/passwd
file (confusingly)
does not contain even the hashed version of passwords; those are found
in a file called /etc/shadow
, which can only be accessed by root. The
extraction of passwords into /etc/shadow
, and maintenance of them
there, is done by the shadow package, which we’ll set up early in
the target-side build.
These files have colon-delimited fields and newline-delimited records,
so they’re pretty easy to read; man 5 passwd
describes the file
format. The second field is for the hashed password itself. If the field
is blank, that account requires no password to log in; if it contains a
single "x," that indicates that the password is stored in /etc/shadow
as described a moment ago.
Users can belong to groups. Group definitions are similar to user
definitions, but are in the file /etc/group
— and, similarly to the
passwd
and shadow
files, the password for groups that require
passwords are usually found in /etc/gshadow
.[9]
Initially we’re just going to create the root
user and group, and a
few other system users and groups that are expected to exist by various
programs.
if [ ! -f /etc/passwd ]
then
echo "Creating passwd file"
cat > /etc/passwd <<-EOF
root::0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/bin/false
daemon:x:2:6::daemon:/sbin:/bin/false
syslog:x:18:18:syslogd user:/var/log/syslogd:/bin/false
klog:x:19:19:klogd user:/var/log/klogd:/bin/false
nobody:x:65534:65533:Unprivileged User:/dev/null:/bin/false
EOF
else
echo "passwd file is already present"
fi
if [ ! -f /etc/group ]
then
echo "Creating group file"
cat > /etc/group <<EOF
root:x:0:
bin:x:1:
sys:x:2:
kmem:x:3:
tape:x:4:
tty:x:5:
daemon:x:6:
floppy:x:7:
disk:x:8:
lp:x:9:
dialout:x:10:
audio:x:11:
video:x:12:
utmp:x:13:
usb:x:14:
cdrom:x:15:
adm:x:16:
input:x:17:
syslog:x:18:
klog:x:19:
mail:x:30:
wheel:x:39:
nogroup:x:65533:
EOF
else
echo "group file is already present"
fi
11.4.7. Process accounting files
There are some programs that write log data — like process resource usage data — to specific files, but only if those files already exist. Let’s create them.
if [ ! -f /var/log/btmp ]
then
echo "Creating process accounting files"
touch /var/log/{btmp,{last,fail}log,wtmp}
chgrp 13 /var/log/{last,fail}log
chmod 0664 /var/log/{last,fail}log
chmod 0600 /var/log/btmp
else
echo "Process accounting files are already present"
fi
11.4.8. Shell startup files
The root user should have shell startup files just as other users do,
and there’s no more-convenient place to set those up. Generally for the
root user I like avoiding any significant shell startup activities, I
only set environment variables, and I like having the same set of
environment variables regardless of whether I’m using a login shell (for
which the .bash_profile
script is sourced) or a non-login shell (for
which .bashrc
is sourced), so I link those together.
More typically, .bash_profile
can be used for any commands that should
only be run at login time — like starting an ssh-agent process, for
example — and .bashrc
can be used for commands that should be run any
time a new subshell is executed. (It’s common for .bash_profile
to
source in .bashrc
as well, but this is by no means necessary.).
if [ ! -f /root/.bashrc ]
then
echo "Creating root bash startup scripts"
echo 'export LITBUILDDBDIR=/var/litbuilddb' > /root/.bash_profile
echo "export PS1='# '" >> /root/.bash_profile
echo 'export PATH=/usr/bin:/bin:/usr/sbin:/sbin' >> /root/.bash_profile
In addition to typical environment variables like PATH
, we again want
root’s environment to include all of the configuration parameters that
will be used when running litbuild on the final system. Some of these,
like PATCH_DIR and TARFILE_DIR, will be useful after the build is
complete, so those will be written to .bash_profile
. Others won’t, so
we’ll write those to a .cblrc
file that gets sourced in from
.bashrc
— once the build is complete, we can remove the command that
sources that file from .bash_profile
.
echo 'source /root/.cblrc' >> /root/.bash_profile
The code that sets these environment variables digs a little bit deeper
into the bash
bag of tricks than usual; the idea is, for each of the
environment variables we want to set, we’ll echo both the name of the
variable and the result of expanding the name of the variable as an
environment variable into the .bash_profile
script — an indirect
reference. To do this we need to escape the first $
, because otherwise
bash
expands $$
into its own process ID, which is not helpful.
I didn’t actually figure out how to do indirect references in bash; I did a something search and found everything I needed in chapter 28 of the "Advanced Bash-Scripting Guide," which is part of the Linux Documentation Project.
for var in KERNEL_TARGET PATCH_DIR TARFILE_DIR
do
echo "export $var='$(eval echo \$$var)'" >> /root/.bash_profile
done
for var in BOOT_DEVICE BOOTLOADER DOMAIN_NAME HOST_NAME \
LOGIN_FULL_NAME LOGIN TARGET_GCC_CONFIG TARGET_SYSTEM_CFLAGS \
TARGET_SYSTEM_MAKEFLAGS DOCUMENT_DIR LOGFILE_DIR SCRIPT_DIR \
WORK_SITE
do
echo "export $var='$(eval echo \$$var)'" >> /root/.cblrc
done
chmod 700 /root/.bash_profile
ln -s .bash_profile /root/.bashrc
else
echo "Root bash startup scripts are already present"
fi
11.4.9. Virtual memory for the initial target system
The Linux kernel can write pages of RAM to storage devices like disk partitions or files in the filesystem, which frees up those pages to be allocated to other programs; this is called "swapping", and the blocks of storage allocated to this purpose are referred to as "virtual memory" because they work like (very slow) RAM, or "swap space" because it’s space allocated to swapping. The terms are pretty much interchangeable.
It’s always a good idea to have some swap space available. That way, allocated memory that hasn’t been used in a long time can be written out to the swap area. If any process needs to access those pages, the kernel can re-read them from swap, so the only drawback is that access to those pages of memory takes longer than if they were already in RAM.
A configuration parameter, TARGET_SWAP_DEVICE
, can be used to specify
a block device to be used as virtual memory. If it exists and isn’t
already in use, the following code will set it up. If the device exists
but is something other than swap space, it will be ignored — just in
case the parameter has an incorrect value and refers to an important
filesystem or something.
if [ -b /dev/vdb ]
then
echo "Checking for swap device /dev/vdb..."
blkid /dev/vdb
if [ $? -ne 0 ]
then
echo "Initializing swap device /dev/vdb"
mkswap /dev/vdb
fi
if blkid /dev/vdb | grep -q swap
then
echo "Activating swap device /dev/vdb"
swapon /dev/vdb
fi
fi
CBL systems use s6 and s6-rc to manage the system state. We can
set up the very first parts of that here — that is, the source
directory that will contain service definitions, and the run-level
bundle directories inside it. As other parts of the system are built,
directories for other services will be created and, in many cases, added
to the rl-default
bundle.
We also set up an network-services
bundle that will be part of the
rl-default
bundle if networking and network services should be part of
the standard system run state.
This is all explained in much more detail in later sections: s6-rc, Configure the system initialization framework, and Construct the s6-rc service database.
if [ ! -d /etc/s6-rc/source ]
then
echo "Setting up initial s6-rc structures"
mkdir -m 0755 -p /etc/s6-rc
mkdir -m 0755 -p /etc/s6-rc/source
mkdir -m 0755 -p /etc/s6-rc/source/rl-default
echo "bundle" > /etc/s6-rc/source/rl-default/type
touch /etc/s6-rc/source/rl-default/contents
mkdir -m 0755 -p /etc/s6-rc/source/network-services
echo "bundle" > /etc/s6-rc/source/network-services/type
touch /etc/s6-rc/source/network-services/contents
fi
11.4.10. Installing litbuild
The litbuild program needs to be installed so that we can use it to create scripts to build the rest of the system. We could simply install the rubygem-packaged version of litbuild, but it’s more consistent with the rest of CBL to install it from a source tarfile.
Of course, if the litbuild gem is already installed, we don’t need to build and install it.
Building the gem is a little bit fussy! When rubygems is used to build a
gem package with gem build
, it tries to compute checksums for the
files as it adds them. This requires the ruby OpenSSL extension, which
isn’t available in the scaffolding ruby; at least in Ruby 2.5.1, this
causes the gem
command to raise an exception and crash the build. It’s
a bit of a kludge, but we can simply avoid raising the exception and
everything will be fine.
echo "Looking for an installed litbuild..."
type lb
if [ $? -eq 1 ]
then
echo "Building and installing the litbuild gem"
# kludge to avoid unnecessary `gem` crash
find /scaffolding -name tar_writer.rb | while read file
do
sed -i -e '/raise .* unless signature_digest/s@^@#@' $file
done
pushd /tmp
tar -x -f /scaffolding/materials/litbuild*tar
cd litbuild*
gem build litbuild.gemspec
gem install -l litbuild*gem
popd
rm -rf /tmp/litbuild*
else
echo "Litbuild gem is already installed"
fi
To use the skip-upon-restart feature of litbuild throughout the target-side build, we can set up a litbuild database directory.
export LITBUILDDBDIR=/scaffolding/litbuilddb
if [ ! -d $LITBUILDDBDIR ]
then
echo "Creating litbuild database directory"
mkdir $LITBUILDDBDIR
fi
echo "Execution of startup-target-system.sh complete"
11.4.11. The target-side-build script
Now let’s proceed with the target-side build per se. We tell sysvinit to run this script when entering runlevel 2, but not restart it after it terminates: when this script terminates, the CBL build will (hopefully) be complete!
tb:2:once:/scaffolding/target-side-build.sh
Once again, we need to do the same console-redirection trick we did earlier.
#!/scaffolding/bin/bash
exec >/dev/console 2>&1 </dev/console
echo "Console activated for target-side-build.sh"
By default, bash
keeps track of the location for programs it runs,
which lets it skip looking through the PATH
when it needs to find
those programs again. As we construct the final system programs, we want
those to be used in preference to the scaffolding versions, so we
disable this caching behavior.
set +h
If anything goes wrong during the remainder of the build, we can bail
out to an interactive shell. To do this, we’ll set the -e
bash option
(which causes any failing command to terminate the script) but set traps
so that the termination — normal or unexpected — will cause the
process to execute an interactive bash shell.
trap 'exec /scaffolding/bin/bash' EXIT
trap 'exec /scaffolding/bin/bash' ERR
set -e
Now we can start the target-side build itself! This consists of running litbuild to produce scripts, then executing those scripts.
The first litbuild target builds the remainder of the scaffolding components — all the stuff we need for the real target-side build but which is difficult or impossible to cross-compile on the host system — and sets up the package-users framework.
cd /scaffolding/cbl
echo "Beginning target-side-initial build"
LOGFILE_DIR=/root/cbl-logs/0-target-side-initial lb target-side-initial
/tmp/build/scripts/00-target-side-initial.sh
rm -rf /tmp/build
Second, we build the final system components — these will automatically be generated as package-user-style build scripts, because the package-users framework is installed. This is a good time to switch to a final system litbuild database directory, since this is the point where we’ll start to build out the final system!
export LITBUILDDBDIR=/var/litbuilddb
if [ ! -d $LITBUILDDBDIR ]
then
echo "Creating litbuild database directory"
mkdir $LITBUILDDBDIR
fi
echo "Beginning target-side-final build"
LOGFILE_DIR=/root/cbl-logs/1-target-side-final lb target-side-final
/tmp/build/scripts/00-target-side-final.sh
rm -rf /tmp/build
At this point, we have the bash
program at the canonical system
location, /bin/bash
, so we can change the traps we set earlier to exec
into that shell if the init script terminates.
trap 'exec /bin/bash' EXIT
trap 'exec /bin/bash' ERR
And, finally, after building all of the programs and libraries needed on
the system, we can finish up the system: tidy things up, remove any
remaining references to the /scaffolding
directory, configure the init
system, and perhaps install the boot loader.
echo "Beginning complete-the-system build"
LOGFILE_DIR=/root/cbl-logs/2-complete-the-system lb complete-the-system
/tmp/build/scripts/00-complete-the-system.sh
rm -rf /tmp/build
There are a couple more things we can do to tidy up before we declare the build complete.
Before shutting down the system, we should remount the root filesystem
read-only. This is tricky because there may be some processes still
lingering from the build process — I often see a few processes running
as glibc
, left over from its test suite, but there might be others as
well.
Unfortunately, these processes are holding on to open file descriptors
on the root filesystem, so we have to terminate them in order to remount
that filesystem read-only. And unfortunately, that’s also tricky:
killing those processes can, for some reason I find completely baffling,
cause the target-side-build
script to hang or crash.
If this happens, you won’t get a COMPLETE SUCCESS
but it’s still safe
to power-cycle the target system: we’re using a journaling filesystem,
so we won’t need to do an extensive filesystem check on reboot or
anything. It’s always a good idea to force all pending I/O operations to
complete before proceeding, though.
echo "Finding lingering build processes..."
cd /proc
while ls -l | grep -v total | grep -q -v root
do
pid=$(ls -l | grep -v total | grep -v root | \
head -n 1 | awk '{print $9}')
owner=$(ls -l | grep "$pid\$" | awk '{print $3}')
echo -n "Found PID $pid owned by $owner, terminating..."
kill -9 $pid && echo "killed" || echo "already gone"
sleep 1
done
If that worked, we should be able to make the root filesystem read-only and declare victory. But sometimes — honestly I have no idea what the deal is — I still find that Linux thinks the root filesystem has files open for writing. So we’ll put some explanations in the console messaging, just in case.
echo "Synchronizing disks"
sync
echo "Attempting to remount the root filesystem read-only."
echo "(If that fails, it does not indicate a dire problem. The"
echo "filesystem has a journal, and we have just sync'ed it.)"
echo ""
set +e
mount -o remount,ro /
sync
The target system build is complete. Nothing remains but to turn off the computer!
Under normal circumstances, the System V init program we’re using can be
told to shut down using the shutdown
or telinit
programs, which are
still present in the scaffolding directory. That won’t work at this
point, though, because s6-linux-init has installed its own versions
of shutdown
, telinit
, and other similar programs. We can’t use
those as we typically would, either, because they’re expected to be run
as part of the complete low-level userspace configuration set up by
s6-linux-init-maker
, which of course is not the case for the minimal
target system.
In the typical running system circumstances, the init framework starts a
long-running process running s6-linux-init-shutdownd
, which listens on
a fifo for instructions from other programs like
s6-linux-init-shutdown
and performs clean shutdown operations when
told to do so. The very last thing it does is run s6-linux-init-hpr
with an -f
(for "force") option, which does a final filesystem sync
operation and then uses the reboot
system call to perform the actual
shutdown (or halt, or reboot) operation. We can skip all the other
complexity and go straight to the shutdown operation.
echo "COMPLETE SUCCESS"
sync
echo ""
echo "Shutting down the system now."
sleep 5
s6-linux-init-hpr -f -p -W
If anything goes wrong with the s6-linux-init-hpr
command, the trap we
set earlier will hopefully cause the script to drop into an interactive
shell when it terminates.
11.4.12. Interactive shell processes
If something goes wrong — or if we just want to examine the progress of the build as it proceeds — it’s helpful to have some interactive shells on other virtual terminals. This only works for target systems that support virtual terminals, obviously! Many emulated systems do not, but it doesn’t cause any problem to run some shell processes even on those systems, it just wastes a little bit of memory.
To do these, we’ll use the openvt
command to specify which virtual
consoles to run them on. The build scripts are typically running on the
first virtual console, so we’ll run interactive shells on the second
through sixth. (That’s a totally arbitrary decision; you can run as many
as you like.)
Unlike the other init
-run commands, these are set to be restarted if
they terminate, by using the keyword respawn
rather than once
in the
inittab
directive. (That doesn’t actually work, though, for some
reason; maybe the virtual terminals need to be closed before they can be
re-opened.)
b2:2:respawn:/scaffolding/bin/openvt -e --console=2 /scaffolding/bin/bash
b3:2:respawn:/scaffolding/bin/openvt -e --console=3 /scaffolding/bin/bash
b4:2:respawn:/scaffolding/bin/openvt -e --console=4 /scaffolding/bin/bash
b5:2:respawn:/scaffolding/bin/openvt -e --console=5 /scaffolding/bin/bash
b6:2:respawn:/scaffolding/bin/openvt -e --console=6 /scaffolding/bin/bash
11.4.13. Finishing touches
Of course, the scripts that are supposed to be executed by init
need
to be executable!
chmod a+x /home/lbl/work/sysroot/scaffolding/*.sh
11.4.14. Complete text of files
11.4.14.1. /home/lbl/work/sysroot/scaffolding/etc/initscript
#!/scaffolding/bin/bash
export PATH=/scaffolding/bin:/scaffolding/sbin
export LD_LIBRARY_PATH=/scaffolding/lib
export LC_ALL=POSIX
export PATH=/bin:/usr/bin:/sbin:/usr/sbin:$PATH
export LD_LIBRARY_PATH=/lib:/usr/lib:$LD_LIBRARY_PATH
export MAKEFLAGS="-j8"
export HOME=/root
export BOOT_DEVICE=''
export BOOTLOADER='manual'
export HOST_NAME='cbl'
export DOMAIN_NAME='lblinux.org'
export KERNEL_TARGET='Image.gz'
export LOGIN_FULL_NAME='A Little Blue User'
export LOGIN='lbl'
export TARGET_GCC_CONFIG='--enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-cpu=cortex-a72.cortex-a53'
export TARGET_SYSTEM_CFLAGS='-O2 -fno-omit-frame-pointer -march=native'
export TARGET_SYSTEM_MAKEFLAGS='-j8'
export TARGET_SERIAL_DEV='ttyAMA0'
export DOCUMENT_DIR=/tmp/build/doc
export LOGFILE_DIR=/tmp/build/logs
export PATCH_DIR=/scaffolding/materials
export SCRIPT_DIR=/tmp/build/scripts
export TARFILE_DIR=/scaffolding/materials
export WORK_SITE=/scaffolding/build
eval exec "$4"
11.4.14.2. /home/lbl/work/sysroot/scaffolding/etc/inittab
id:2:initdefault:
bw::bootwait:/scaffolding/startup-target-system.sh
tb:2:once:/scaffolding/target-side-build.sh
b2:2:respawn:/scaffolding/bin/openvt -e --console=2 /scaffolding/bin/bash
b3:2:respawn:/scaffolding/bin/openvt -e --console=3 /scaffolding/bin/bash
b4:2:respawn:/scaffolding/bin/openvt -e --console=4 /scaffolding/bin/bash
b5:2:respawn:/scaffolding/bin/openvt -e --console=5 /scaffolding/bin/bash
b6:2:respawn:/scaffolding/bin/openvt -e --console=6 /scaffolding/bin/bash
11.4.14.3. /home/lbl/work/sysroot/scaffolding/startup-target-system.sh
#!/scaffolding/bin/bash
mount -o remount,rw /
if [ ! -d /dev ]
then
mkdir -m 0755 -p /dev /proc /run /sys /tmp
fi
mount -n -o nosuid,noexec,nodev -t sysfs sys /sys
mount -n -o nosuid,noexec,nodev -t proc proc /proc
mount -n -o mode=0755,nosuid -t devtmpfs dev /dev
mount -n -o mode=0755,nosuid -t tmpfs none /run
mount -n -o mode=1777,nosuid -t tmpfs none /tmp
exec >/dev/console 2>&1 </dev/console
echo "Console activated for startup-target-system.sh"
echo "Adding fake randomness to the entropy pool"
addentropy < /scaffolding/bin/addentropy
echo "Fake randomness added"
if [ ! -d /bin ]
then
echo "Creating standard system directories"
pushd /
chmod 0775 .
mkdir -m 0755 -p bin boot etc home lib libexec media mnt opt
mkdir -m 0755 -p sbin usr var
mkdir -m 0750 -p root
pushd usr
mkdir -m 0755 -p bin include lib libexec local sbin share src
pushd share
mkdir -m 0755 -p doc info locale man misc terminfo zoneinfo
pushd man
mkdir -m 0755 -p man1 man2 man3 man4 man5 man6 man7 man8
popd # /usr/share/man
popd # /usr/share
pushd local
mkdir -m 0755 -p bin include lib libexec sbin share src
pushd share
mkdir -m 0755 -p doc info locale man misc terminfo zoneinfo
pushd man
mkdir -m 0755 -p man1 man2 man3 man4 man5 man6 man7 man8
popd # /usr/local/share/man
popd # /usr/local/share
popd # /usr/local
popd # /usr
pushd var
mkdir -m 0755 -p cache lib local lock log mail opt spool
mkdir -m 1777 -p tmp
ln -s /run /var/run
popd # /var
popd # /
else
echo "Standard system directories are already present"
fi
if [ ! -L /usr/libx32 ]
then
echo "Creating multilib symbolic links"
for DIR in / /usr
do
pushd $DIR
for MULTILIBDIR in lib32 lib64 libx32
do
ln -sv lib ${MULTILIBDIR}
done
popd
done
else
echo "Multilib symbolic links are already present"
fi
if [ ! -L /etc/mtab ]
then
echo "Creating mtab symbolic link"
ln -sv /proc/self/mounts /etc/mtab
else
echo "mtab symbolic link is already present"
fi
if [ ! -f /etc/passwd ]
then
echo "Creating passwd file"
cat > /etc/passwd <<-EOF
root::0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/bin/false
daemon:x:2:6::daemon:/sbin:/bin/false
syslog:x:18:18:syslogd user:/var/log/syslogd:/bin/false
klog:x:19:19:klogd user:/var/log/klogd:/bin/false
nobody:x:65534:65533:Unprivileged User:/dev/null:/bin/false
EOF
else
echo "passwd file is already present"
fi
if [ ! -f /etc/group ]
then
echo "Creating group file"
cat > /etc/group <<EOF
root:x:0:
bin:x:1:
sys:x:2:
kmem:x:3:
tape:x:4:
tty:x:5:
daemon:x:6:
floppy:x:7:
disk:x:8:
lp:x:9:
dialout:x:10:
audio:x:11:
video:x:12:
utmp:x:13:
usb:x:14:
cdrom:x:15:
adm:x:16:
input:x:17:
syslog:x:18:
klog:x:19:
mail:x:30:
wheel:x:39:
nogroup:x:65533:
EOF
else
echo "group file is already present"
fi
if [ ! -f /var/log/btmp ]
then
echo "Creating process accounting files"
touch /var/log/{btmp,{last,fail}log,wtmp}
chgrp 13 /var/log/{last,fail}log
chmod 0664 /var/log/{last,fail}log
chmod 0600 /var/log/btmp
else
echo "Process accounting files are already present"
fi
if [ ! -f /root/.bashrc ]
then
echo "Creating root bash startup scripts"
echo 'export LITBUILDDBDIR=/var/litbuilddb' > /root/.bash_profile
echo "export PS1='# '" >> /root/.bash_profile
echo 'export PATH=/usr/bin:/bin:/usr/sbin:/sbin' >> /root/.bash_profile
echo 'source /root/.cblrc' >> /root/.bash_profile
for var in KERNEL_TARGET PATCH_DIR TARFILE_DIR
do
echo "export $var='$(eval echo \$$var)'" >> /root/.bash_profile
done
for var in BOOT_DEVICE BOOTLOADER DOMAIN_NAME HOST_NAME \
LOGIN_FULL_NAME LOGIN TARGET_GCC_CONFIG TARGET_SYSTEM_CFLAGS \
TARGET_SYSTEM_MAKEFLAGS DOCUMENT_DIR LOGFILE_DIR SCRIPT_DIR \
WORK_SITE
do
echo "export $var='$(eval echo \$$var)'" >> /root/.cblrc
done
chmod 700 /root/.bash_profile
ln -s .bash_profile /root/.bashrc
else
echo "Root bash startup scripts are already present"
fi
if [ -b /dev/vdb ]
then
echo "Checking for swap device /dev/vdb..."
blkid /dev/vdb
if [ $? -ne 0 ]
then
echo "Initializing swap device /dev/vdb"
mkswap /dev/vdb
fi
if blkid /dev/vdb | grep -q swap
then
echo "Activating swap device /dev/vdb"
swapon /dev/vdb
fi
fi
if [ ! -d /etc/s6-rc/source ]
then
echo "Setting up initial s6-rc structures"
mkdir -m 0755 -p /etc/s6-rc
mkdir -m 0755 -p /etc/s6-rc/source
mkdir -m 0755 -p /etc/s6-rc/source/rl-default
echo "bundle" > /etc/s6-rc/source/rl-default/type
touch /etc/s6-rc/source/rl-default/contents
mkdir -m 0755 -p /etc/s6-rc/source/network-services
echo "bundle" > /etc/s6-rc/source/network-services/type
touch /etc/s6-rc/source/network-services/contents
fi
echo "Looking for an installed litbuild..."
type lb
if [ $? -eq 1 ]
then
echo "Building and installing the litbuild gem"
# kludge to avoid unnecessary `gem` crash
find /scaffolding -name tar_writer.rb | while read file
do
sed -i -e '/raise .* unless signature_digest/s@^@#@' $file
done
pushd /tmp
tar -x -f /scaffolding/materials/litbuild*tar
cd litbuild*
gem build litbuild.gemspec
gem install -l litbuild*gem
popd
rm -rf /tmp/litbuild*
else
echo "Litbuild gem is already installed"
fi
export LITBUILDDBDIR=/scaffolding/litbuilddb
if [ ! -d $LITBUILDDBDIR ]
then
echo "Creating litbuild database directory"
mkdir $LITBUILDDBDIR
fi
echo "Execution of startup-target-system.sh complete"
11.4.14.4. /home/lbl/work/sysroot/scaffolding/target-side-build.sh
#!/scaffolding/bin/bash
exec >/dev/console 2>&1 </dev/console
echo "Console activated for target-side-build.sh"
set +h
trap 'exec /scaffolding/bin/bash' EXIT
trap 'exec /scaffolding/bin/bash' ERR
set -e
cd /scaffolding/cbl
echo "Beginning target-side-initial build"
LOGFILE_DIR=/root/cbl-logs/0-target-side-initial lb target-side-initial
/tmp/build/scripts/00-target-side-initial.sh
rm -rf /tmp/build
export LITBUILDDBDIR=/var/litbuilddb
if [ ! -d $LITBUILDDBDIR ]
then
echo "Creating litbuild database directory"
mkdir $LITBUILDDBDIR
fi
echo "Beginning target-side-final build"
LOGFILE_DIR=/root/cbl-logs/1-target-side-final lb target-side-final
/tmp/build/scripts/00-target-side-final.sh
rm -rf /tmp/build
trap 'exec /bin/bash' EXIT
trap 'exec /bin/bash' ERR
echo "Beginning complete-the-system build"
LOGFILE_DIR=/root/cbl-logs/2-complete-the-system lb complete-the-system
/tmp/build/scripts/00-complete-the-system.sh
rm -rf /tmp/build
echo "Finding lingering build processes..."
cd /proc
while ls -l | grep -v total | grep -q -v root
do
pid=$(ls -l | grep -v total | grep -v root | \
head -n 1 | awk '{print $9}')
owner=$(ls -l | grep "$pid\$" | awk '{print $3}')
echo -n "Found PID $pid owned by $owner, terminating..."
kill -9 $pid && echo "killed" || echo "already gone"
sleep 1
done
echo "Synchronizing disks"
sync
echo "Attempting to remount the root filesystem read-only."
echo "(If that fails, it does not indicate a dire problem. The"
echo "filesystem has a journal, and we have just sync'ed it.)"
echo ""
set +e
mount -o remount,ro /
sync
echo "COMPLETE SUCCESS"
sync
echo ""
echo "Shutting down the system now."
sleep 5
s6-linux-init-hpr -f -p -W
Before we can start the target-system part of the process, we need to convey the scaffolding to the target system and launch the target-side build. I call that process the "bridge," because it takes us from the host-system side of CBL to the target-system side.
The exact steps you follow to bridge from the host to the target depend
on whether the host, or target, or both, are physical or emulated. The
different possibilities are described in different blueprints, which can
be selected using the TARGET_BRIDGE
configuration parameter. The
default is the QEMU bridge, but another alternative is the "manual"
bridge.
12. Manual Host-To-Target Bridge
You always have the option of manually conveying the target scaffolding to the target system and setting up an appropriate boot loader there. This may be the best option if your target system is a real, physical computer.
Some ways you might get this going are:
-
you can copy the sysroot directory contents (which should just be the
/scaffolding
directory) to a USB flash device, install a boot loader on it, and and boot that device on the target system, or -
you can set up the host system as an NFS and TFTP server and boot the target system over the network, or
-
if your target system already has a usable Linux distribution on it, you can move the sysroot directory there and then use QEMU to run the target system as a virtual machine — presumably without emulation — so you can continue to use the computer while the CBL build completes.
Since this is a manual process, there’s nothing for litbuild to do with this blueprint when generating scripts — you’re on your own.
Once you’ve got something working, you might want to consider whether it’s the sort of thing that could be automated. If you can write a target-bridge blueprint for the process you followed, you’ll be able to run the entire CBL process without manual intervention in the future.
The Target-Side Build
The second half of the CBL process, which is executed by the scripts created in Write the Scaffolding Init Scripts, runs on the target system. It uses the scaffolding prepared using the cross-toolchain to build a complete minimal GNU/Linux system.
As with the host-side build, there are a few distinct stages to the
target-side build: first, we’re going to get the target system to the
point that the package users framework is installed; then we’re going to
install all the real target system components using that framework.
Finally, once all the software that comprises the target system is built
and installed, we’ll install a boot loader so that the computer will
actually load and run the Linux kernel when it’s turned on, and set up
an init
framework that will get the system to a working and useful
state.
13. Target Side Of The CBL Process, Through Package-Users
As with other portions of the CBL process, we set up a database directory that will be used by litbuild-generated scripts to figure out if they need to be run and bail out if not.
- Environment variable: LITBUILDDBDIR
-
/scaffolding/litbuilddb
The first thing to do is adjust the scaffolding GCC so it knows how to link programs, and set up some symbolic links that will be used for the first several packages.
13.1. Ensure That Files Are Owned By Root
Depending on how the target root filesystem — which at this point only
includes the /scaffolding
directory — was built, some or all of it
may be owned by a user other than root. This leads to a couple of
problems.
First, as soon as the lbl
user is created, it’s likely that
it will suddenly own most of the scaffolding and the top-level directory
in the filesystem, which is weird and ugly.
Second, and more important, the configuration file repository setup process won’t work if anything being put into the repository is owned by a user that does not exist, and this will often be the case for the top-level directory in the filesystem.
So, before we do anything else, we’re going to make sure everything is owned by root. We can ensure the top-level directory has the correct permissions, as well.
chown 0:0 /
chmod 755 /
chown -R 0:0 /scaffolding
13.2. Adjusting the GCC specs (scaffolding-gcc phase)
For an overview of specs-adjustment, see Adjusting the GCC specs.
Just like the specs file had to be adjusted for the cross-gcc so that it would use the correct dynamic loader, the target-native gcc we built as part of the scaffolding has to be adjusted — otherwise, the programs it builds won’t be able to find the dynamic loader. Also, we have to override GCC’s notion of the header file location.
gcc -dumpspecs | \
sed -e 's@/lib/ld@/scaffolding/lib/ld@g' \
-e 's@/lib32/ld@/scaffolding/lib/ld@g' \
-e 's@/lib64/ld@/scaffolding/lib/ld@g' \
-e 's@/libx32/ld@/scaffolding/lib/ld@g' \
-e '/^\*cpp:$/ { n; s/$/ -isystem \/scaffolding\/include/ }' > \
$(dirname $(gcc --print-libgcc-file-name))/specs
13.3. Create Symbolic Links For Bash
Some programs and libraries have build processes that assume that there
is a shell program at /bin/bash
or /bin/sh
. That assumption is wrong
during the first part of the target-side build, because we haven’t built
the final system bash
yet. So for now, we can set up symbolic links
that point to the scaffolding bash
.
We’ll remove these symbolic links right before we install the final
system bash
.
ln -s /scaffolding/bin/bash /bin/bash
ln -s /scaffolding/bin/bash /bin/sh
We can’t start building the final system programs and libraries yet,
because we need some more components that are difficult or impossible to
cross-compile. These are still part of the scaffolding, and are
installed in /scaffolding
just like the other components. Since
they’re built from within the target system as native programs, CBL
refers to these as "target scaffolding" components.
13.4. lzip
Name |
Lzip compression utility |
---|---|
Version |
1.22 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
13.4.1. Overview
Lzip is a data compression program that uses the LZMA algorithm, just like XZ Utils does. Because it doesn’t have the same convoluted project history as XZ Utils, though, it seems to have a slightly tidier codebase and a simpler format for compressed files. (The lzip homepage observes, "The lzip manual provides the code of a simple decompressor along with a detailed explanation of how it works, so that with the only help of the lzip manual it would be possible for a digital archaeologist to extract the data from a lzip file long after quantum computers eventually render LZMA obsolete" — which seems like a good idea if you’re going to use a program for long-term archival storage.)
Also, it tends to compress files just a litle bit better than XZ Utils does, and substantially better than other common compression utilities.
All of the CBL source materials (except lzip) are compressed with lzip because it provides better space savings than any of the other compression programs.
13.4.2. lzip (pre-target-scaffolding phase)
Lzip doesn’t make any allowances for being cross-compiled: if host
,
target
, and so on are specified, they are simply ignored. That means
it really can’t be built until we are booted into the target system. We
need to build it before we build anything else on the target system,
though, because the other source code archives are all lzip-compressed!
bash ./configure --prefix=/scaffolding
make
(none)
make install
13.5. Construction of scaffolding as native programs
We’re finally in the target system and able to start building things
there! But we can’t start right in on the final system components. There
is one more set of programs we need to build into the /scaffolding
area first: programs and libraries that are problematic to
cross-compile.
Since these packages need to be compiled natively, we needed to wait
until we got the target system working to address them. But these are
not going to be a part of the final system any more than the other
pieces of scaffolding are, and that means that we’re going to install
them into /scaffolding
and avoid touching the rest of the filesystem.
(There’s one exception: the shadow package is going to be set up to
store its configuration files in the /etc
directory, which will of
course be part of the final system. But those are just configuration
files — there are no binaries there.)
Since we’re finally in the target system, we don’t have to worry about
DESTDIR
any more — we can just set these to install to
/scaffolding
, since it’s now in the correct location as a top-level
directory.
- Dependencies
On the other hand, we do need to worry about bash (and possibly other
programs) being installed in a non-standard location: the configure
scripts provided with these packages have an interpreter directive
(a.k.a. "shebang" line) that tells them to run under /bin/sh
, which
doesn’t exist yet; all we have is /scaffolding/bin/bash
. We work
around this for the native scaffolding builds as well as the rest of the
initial target-system setup, up to the point where we build the final
system’s bash shell, by setting up symbolic links from /bin/bash
and
/bin/sh
to the scaffolding bash. That’s a little bit kludgy, but the
other options are even worse.
For other programs needed in standard system locations by specific
program builds (for example, the perl build needs to find /bin/pwd
),
we’ll also create symbolic links to the standard filesystem locations,
but since those aren’t as pervasively used as /bin/sh
or /bin/bash
,
we’re going to create the links just before configuring the package, and
we’ll remove them after the package is installed.
Notice that, here, we’re building and installing all this stuff as root!
That’s not the practice we use when setting up the final system
components, but for this handful of programs there’s really no point in
worrying about any potential problems from building and installing as
root. We’ll be able to check, trivially, that they haven’t done anything
improper to the rest of the system — because we don’t really have the
rest of the system yet, anything outside of /scaffolding
is a sign of
an issue! And once the final system is installed we’re going to delete
/scaffolding
entirely; so if anything installed here clobbers other
scaffolding files or anything like that, it won’t cause any long-term
problems.
13.5.1. tcl
Name |
Tcl (Tool Command Language) |
---|---|
Version |
8.6.11 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
13.5.1.1. Overview
Tcl is an interpreted programming language. I don’t use it for anything, myself, so I can’t say whether or not it’s a good language to learn. For CBL, the driving aspect is that we want to run the automated test suites for the toolchain components, and some of those test suites are implemented in the DejaGnu framework. DejaGnu depends on the Expect tool, which in turn is an extension to Tcl; ergo, to run the toolchain tests, we need Tcl.
13.5.1.2. tcl (target-scaffolding phase)
Build Directory |
|
---|
Tcl has support for cross-compilation built into its configure script, but it doesn’t seem to work very well so it’s built on the target side instead.
- Build Directory
-
unix
This package is one of the irritating ones that necessitate a symbolic
link at /bin/sh
during the first part of the target-side build: there
are a lot of subdirectories under /pkg
that have configure scripts
with a shebang path of /bin/sh
. The build process for tcl runs these
configure scripts without specifying an interpreter, even if
CONFIG_SHELL
is specified.
There are other alternatives we could use rather than creating the bash
symlinks: we could, for example, modify Makefile.in
so that instead of
running i/configure` it runs `bash i/configure
. But the symbolic
links approach isn’t really bad and is a lot less work.
bash ./configure --prefix=/scaffolding
The tcl package includes a copy of the sqlite database engine, to make it easy to use from tcl. That’s probably a good idea in most circumstances, but the main sqlite source code file is over seven megabytes in size, and some of the target CBL systems are quite resource-constrained (for example, QEMU-emulated mipsel virtual machines have a maximum of two gigabytes of RAM, and the GnuBee personal cloud devices have a paltry half-gigabyte) and can have issues compiling it. And we don’t need that package for this scaffolding version of tcl!
rm -rf ../pkgs/sqlite3*
make
(none)
make install
Normally, Tcl doesn’t install some of its header files because they define internal-only functions and data structures — since they are intended only to be used inside Tcl, they shouldn’t be necessary when building other packages. However, the Expect package is an extension to Tcl and, as such, makes use of some Tcl internals.
make install-private-headers
13.5.2. expect
Name |
Expect |
---|---|
Version |
5.45.4 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
Dependencies |
13.5.2.1. Overview
Expect is a tool for automating interactive applications, particularly
command-line applications like ftp
and passwd
. It is written as an
extension to Tcl.
On some servers, the version of config.guess distributed with expect is too old to recognize the target triplet. It’s easy enough to update it to the version distributed with GCC.
-
expect-5.45.4-update-config-guess-1.patch
13.5.2.2. expect (target-scaffolding phase)
As with Tcl, Expect is built as part of the CBL scaffolding because it’s needed to run the automated tests for some toolchain components.
bash ./configure --prefix=/scaffolding --with-tcl=/scaffolding/lib \
--with-tclinclude=/scaffolding/include
make
(none)
make install
Expect installs one of its libraries into a subdirectory of the normal
lib directory, and sets an RPATH in the expect
binary so it can find
the library. Since we’re going to reset the RPATH in all programs and
libraries shortly, this is a bad idea; we can just move the library to
the normal location to avoid problems.
mv /scaffolding/lib/expect*/* /scaffolding/lib
13.5.3. dejagnu
Name |
DejaGnu |
---|---|
Version |
1.6.3 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
13.5.3.1. Overview
DejaGnu is a framework for testing programs — essentially, a library of Tcl procedures that can be used to construct a test harness, which then provides a single front-end for a suite of automated tests. Some of the toolchain components have automated test suites that use the DejaGnu framework.
13.5.3.2. dejagnu (target-scaffolding phase)
As with Tcl and Expect, DejaGnu is installed as part of the CBL scaffolding so that the automated test suites can all be run.
bash ./configure --prefix=/scaffolding
(none)
(none)
make install
13.5.4. perl
Name |
Perl 5 |
---|---|
Version |
5.34.0 |
Project URL |
|
SCM URL |
git://perl5.git.perl.org/perl.git |
Download URL |
13.5.4.1. Overview
Perl is a general-purpose programming language. It is the oldest of what have historically been thought of as "scripting languages": high-level programming languages, typically interpreted rather than compiled, that make it easy to implement a lot of functionality without a lot of code. It incorporates features from a bunch of other programs (like bash, AWK, and sed).
Other languages, like Python and Ruby, also fit in the "scripting language" niche, but are relative newcomers: Perl has existed since 1987 and has been under active development that whole stime.
13.5.4.2. perl (target-scaffolding phase)
Perl is needed by many of the automated build systems for CBL packages, including toolchain components. The Perl build system has some limited support for cross-compilation, but historically it has not been reliable and is not very complete. There are third-party projects, like perl-cross, to try to work around that lack, but using anything like that would be a pretty heavy-weight addition to the CBL build requirements. So, like the other programs in this section, the scaffolding Perl is built on the target system per se after booting the minimal scaffolding userspace.
One of the Perl source files has a hardcoded directory location that causes problems, so we need to adjust it prior to building.
sed -i 's@/usr/include@/scaffolding/include@g' ext/Errno/Errno_pm.PL
The Perl build-configuration scheme is a script called Configure
that
is generally run interactively and allows many options to be selected
manually. Of course, in CBL we want to avoid any interactivity during
the build process. Luckily, Perl provides a facility for doing that as
well: a script called configure.gnu
. It takes options like
autoconf-produced configuration scripts and translates them into an
invocation of the Configure
script. The configure command used here
is the one generated by running configure.gnu --prefix=/scaffolding
-Dcc="gcc"
.
The Perl configuration and build automation makes a number of
assumptions, not just about what is available on the host system but
also where it can be found. So this is a case where we need to create
another symbolic link outside of the /scaffolding
area during the
build. If the build crashed previously — or if the system crashed
during the build, as qemu sometimes does — the link will already exist
so we should not bail out if that command fails.
set +e
ln -s /scaffolding/bin/pwd /bin/pwd
set -e
./Configure -ds -e -Dprefix=/scaffolding -Dcc=gcc
It appears that when perl 5.30.1 is built with GCC 10.1 using the default settings, the initial "miniperl" program does not work right; I’ve gotten "Attempt to free unreferenced scalar" messages and segmentation faults. Reducing the optimization level and removing a stack-protector setting appears to work around the issue.
sed -i -e 's@-O2@-O0@' Makefile
sed -i -e 's@-fstack-protector-strong@@' Makefile
The build process for perl sometimes crashes on resource-constrained systems. It’s a good idea to retry a couple of times, if this happens.
make || make || make || make
There’s no point in running the automated tests for this perl, since it’s only going to exist for a little while.
(none)
make install
rm -f /bin/pwd
13.5.5. git
Name |
git version control system |
---|---|
Version |
2.32.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
13.5.5.1. Overview
Git is the best version control system around. It was originally written by Linus Torvalds, who had previously been using a program called Bitkeeper to manage the Linux kernel source code. Bitkeeper was a proprietary program, but the kernel developers had permission to use it for free; when that permission was revoked, Linus decided to write an entirely new program to replace it.
In CBL, we use git to manage configuration files, as described in the package-users documentation. (I use git for all kinds of other things as well.)
13.5.5.2. git (target-scaffolding phase)
This is a very basic version of git, with many options disabled; it is only used to set up the repository that is used in CBL to manage configuration files.
./configure --prefix=/scaffolding --without-openssl
Git expects perl to live at /usr/bin/perl
, so we’ll just make that
true for a minute.
set +e
ln -s /scaffolding/bin/perl /usr/bin
set -e
make
We skip the automated tests at this point, as is typical for the scaffolding components.
(none)
make install
rm -f /usr/bin/perl
13.5.6. patchelf
Name |
PatchELF |
---|---|
Version |
0.13 |
Project URL |
|
SCM URL |
git://github.com/nixos/patchelf |
Download URL |
PatchELF is a little utility for modifying ELF executables and libraries. ELF — which stands for "Executable and Linkable Format" — is a standard format for programs, libraries, object code, and things like that. It’s the standard binary format used on GNU/Linux systems.
Among other things, ELF files can have segments that specify runtime
behavior. The interpreter
, if there is one, specifies the program that
will be used as a dynamic linker when running the program, to load in
all the shared libraries that it needs; the RPATH specifies a set of
directories that the dynamic linker should look at to find those
libraries, in addition to the standard lib directories; DT_NEEDED
entries can be used to specify what shared libraries the program or
library depends on… there are a bunch of other segment types as well.
PatchELF lets you modify those segments to change the runtime behavior
of a program or library without rebuilding it from scratch.
CBL includes several components that use libtool
, a part of the GNU
build system, to produce programs and libraries. Because of the way that
the cross-toolchain and scaffolding is set up, a lot of the scaffolding
libraries have an RPATH that include host system directories. It might
be possible to adjust the build process so that doesn’t happen, but it’s
much easier just to remove the RPATH entirely. That’s something that
PatchELF can do for us.
PatchELF is not part of the basic CBL system by default, but it’s trivial to install it once the system is complete if you want it for something.
PatchELF is also useful to fix the dynamic loader path in programs that
insist on looking in multilib directory locations for it (like glibc and
gcc). Since one of the programs we’ll eventually want to adjust this way
is the dynamic loader itself, we need the scaffolding patchelf
to be
statically linked.
CXXFLAGS="-static" ./configure --prefix=/scaffolding
make
The tests for PatchELF don’t all pass on all systems: at least on some systems, for example, the no-rpath-kfreebsd-i386 test fails. Since this native scaffolding component is only going to be used a couple of times, to reset the RPATH and dynamic loader path during the first stages of the target-side CBL build, we don’t need to worry about the automated tests. We can just verify that it did the right thing after we use it.
(none)
make install
13.5.7. popt
Name |
popt |
---|---|
Version |
1.16 |
Project URL |
|
SCM URL |
(none) |
Download URL |
|
Patches |
|
13.5.7.1. Overview
Popt is a library that facilitates command line option parsing. It’s
similar to the getopt
functions provided in the C standard library,
but has some differences that some open source project teams find
worthwhile.
13.5.7.2. popt (target-scaffolding phase)
./configure --prefix=/scaffolding
make
(none)
make install
13.5.8. python
Name |
Python |
---|---|
Version |
3.9.6 |
Project URL |
|
SCM URL |
|
Download URL |
13.5.8.1. Overview
Python is a scripting language, like perl and ruby.
There are two incompatible versions of python currently available: python 3, which is the recommended and modern version, and python 2, which has been deprecated for some time and was officially retired on 1 January, 2020. Some packages that were written for python 2 have not yet been modified to work with the python 3 interpreter, so it _may be desirable to have both installed on your system.
In CBL, we always prefer the latest stable version of everything, so
this section — for python 3 — installs to the /usr
directory where
most system packages are installed. There’s a separate blueprint for
python 2, but that version really only ought to be used if you need a
package that hasn’t yet been ported to python 3; it installs to the
/opt/python2
directory instead. For any package that requires python
2, you can put the /opt/python2/bin
directory in the PATH
while
building and installing it.
(Python has some configuration logic that’s supposed to make it easy to
install multiple major versions of the language in the same directory
prefix, like /usr
, without having them conflict with each other, but
it doesn’t actually work right; that’s why the python 2 blueprint
installs to a location under /opt
.)
Note that the python source archive is distributed as Python-x.y.z.tar
(with a capital P), and unpacks to a directory also with a capital P.
The version available from the CBL file repository has been converted to
use the standard naming convention, but if you obtain the source archive
from the upstream site you’ll have to do that yourself.
13.5.8.2. python (target-scaffolding phase)
Python is a build-time dependency of glibc (as of glibc 2.29), so we need a scaffolding version of python. Like perl, it is problematic to cross-compile python, so this happens in the target scaffolding build.
Python’s dependencies are built earlier, in the host-scaffolding section, so they don’t have to be addressed here.
./configure --prefix=/scaffolding --disable-ipv6
make
(none)
make install
This installation should be available using the program name python
as
well as python3
. The same applies to some other programs.
ln -sf python3 /scaffolding/bin/python
ln -sf pip3 /scaffolding/bin/pip
ln -sf idle3 /scaffolding/bin/idle
13.5.9. rsync
Name |
rsync |
---|---|
Version |
3.2.3 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
13.5.9.1. Overview
rsync
is a program that lets you synchronize files or directory
structures incrementally and efficiently from place to place — from one
location on a computer to a different location, or among multiple
computers. It’s really handy.
There are two source distribution files for rsync: the first contains
the rsync package sources per se, and the second is a collection of
patches that can optionally be applied to the main source directory to
provide additional functionality. Although we are not applying any of
these patches here, we merge the two distribution files together in the
source package in the FreeSA source repository. The patch tarfile
expands into rsync-3.1.3/patches
, so it’s easy to tell what files come
from which upstream distribution file, and it’s convenient to have them
handy in case you want to use any of them. Some, like the
detect-renamed
and omit-dir-changes
patches, seem like they might be
helpful.
rsync
needs the zlib compression library and a library called "popt"
that provides functions for parsing command line options. It includes
copies of both of those libraries, but in CBL we always prefer to use
the latest stable version of all libraries directly from their own
distributions.
- Dependencies
-
popt.
13.5.9.2. rsync (target-scaffolding phase)
The scaffolding is missing some packages that rsync expects, so we need to disable some features that rely on them.
./configure --prefix=/scaffolding --with-included-popt=no \
--with-included-zlib=no --disable-xxhash --disable-zstd \
--disable-lz4
make
(none)
make install
13.5.10. shadow
Name |
Shadow utilities |
---|---|
Version |
4.9 |
Project URL |
|
SCM URL |
git://github.com/shadow-maint/shadow |
Download URL |
|
Patches |
|
13.5.10.1. Overview
"Shadow" is a suite of programs that relate to passwords and logins and
things. The passwd
command that lets you change your password, for
example, is part of the shadow suite.
Long, long ago, in the mysterious days of the 1970s, the hashed version
of user and group passwords appeared directly in /etc/passwd
and
/etc/group
. It turns out that’s a bad idea, because those files have
to be readable by everyone and that makes it possible to run
dictionary-style attacks against all the passwords used on a server — that is, running every word in the dictionary through the same hashing
algorithm, and checking to see whether any of the hashed password
entries in /etc/passwd
matches. One of the ways that issue was
addressed was by moving the hashed passwords into /etc/shadow
and
/etc/gshadow
. That meant that changes had to be made to all the
programs that use or manipulate passwords. All of the programs that were
affected by this change got bundled together into a single package, and
(probably because there’s no compelling reason to do things any
differently) have stayed bundled together ever since. That’s the shadow
package.
The hashing algorithm used by default on most systems — including
Little Blue Linux — is SHA-512. This is a good hashing algorithm that
is considered cryptographically strong (as of August 2021, anyway), but
can be executed fast enough that it’s feasible to mount a brute-force
attack on a hashed password value — that is, the specified salt value
can be added to each possible password, and the resulting candidate
passwords can be run through SHA-512 looking for a match to the stored
value in /etc/shadow
, in a matter of minutes on dedicated hardware.
The computational difficulty of cracking passwords like this can be addressed in a variety of ways, but one of the simplest is to run the hashing algorithm more than one time, taking the output of the algorithm in each round and using it as the input of the hashing algorithm for another round. By default, the SHA-512 algorithm is run 5000 times, which is fast enough that passwords can be validated quickly but increases the time taken to brute-force attack a password by a factor of 5000.
In these days of dedicated ASICs used to mine bitcoin, this is not
nearly enough to protect passwords from brute-force attacks, so you may
want to increase this by setting SHA_CRYPT_MIN_ROUNDS
and
SHA_CRYPT_MAX_ROUNDS
in /etc/login.defs
.
-
shadow-4.9-fix-sha-rounds-1.patch
A bug was introduced in version 4.9 of the package that causes the
hashing algorithm to be applied the maximum number of times if
SHA_CRYPT_MIN_ROUNDS
and SHA_CRYPT_MAX_ROUNDS
are not set. Hence,
rahter than 5000 rounds of SHA-512, passwords were run through
999,999,999 rounds — causing password validation to take at least
several minutes rather than a fraction of a second. This patch pulls in
the fix from upstream.
13.5.10.2. shadow (target-scaffolding phase)
CBL uses the package users scheme (about which you can read more
elsewhere) to track which packages own files and directories and to
prevent packages from stepping on files owned by other packages. That
means we need to have programs like adduser
, addgroup
, and su
available throughout the final system build. Those programs are part of
shadow.
This is the only piece of scaffolding that will be touching the
filesystem outside the /scaffolding
directory: since we’re going to be
using these programs to create users and groups and things that will be
part of the final CBL system, we’re setting the system configuration
file directory to /etc
rather than the default /scaffolding/etc
. If
you look at the files that are created there by the installation
process, you’ll see that none of them are programs or libraries; in
fact, none of them are binary files. So it’s not really a problem to
install them there.
The shadow package allows user names to be up to 32 characters, but limits group names by default to 16 characters. Since we generally create a group for every user, with the same name as the user, it doesn’t make sense to restrict group names to be shorter than user names can be.
./configure --prefix=/scaffolding --sysconfdir=/etc \
--with-group-name-max-length=32
make
A couple of the default values used by useradd
— which you can see by
running useradd -D
— are wrong for CBL. THe useradd
program’s
defaults can be overridden by specifying different values in
/etc/default/useradd
, so we write the new settings there.
make check
make install
mkdir /etc/default
echo "GROUP=" > /etc/default/useradd
echo "CREATE_MAIL_SPOOL=no" >> /etc/default/useradd
Some of the scaffolding programs and libraries are built to look for libraries in directories that existed on the host system, rather than the ones that exist on the target system. Luckily, we can fix that.
13.6. Fix RPATH For Scaffolding Programs
For some reason — possibly because of the way that libtool is used? — some of the scaffolding binaries have RPATH
segments that specify
locations that were present on the host system but are not present on
the target system — a directory relative to the location of the
cross-toolchain, or the full host-system sysroot path, for example. (If
you don’t know what an RPATH
is, you can review the A Word About The Dynamic Linker
section.)
All we need to do here is to remove the RPATH
from binaries that have
a host system directory in it. That way, LD_LIBRARY_PATH
will be used
to find the dynamic object files the binary wants, which is fine. The
PatchELF program can be used to do this.
Sometimes, on some machine architectures, we’ve seen bugs in PatchELF
that cause it to corrupt binaries rather than simply modifying their
runtime paths. That’s less likely to happen when removing the RPATH
entirely, but it’s still a good idea to make a backup of programs and
libraries before running PatchELF on them; that’s what we do here.
To save time, we only look for program and library binaries in the few directories that should contain them. It’s possible that we’re missing some, but this appears to be good enough to prevent any problems from happening later in the build.
cd /scaffolding
find bin lib libexec sbin usr/bin -type f | while read FILE; \
do \
if file $FILE | grep -q ' ELF '; \
then \
if readelf -a $FILE | grep -q 'Library r.*path:.*sysroot'; \
then \
echo "Removing RPATH in $FILE"; \
cp -a ${FILE} ${FILE}-orig; \
patchelf --remove-rpath $FILE; \
fi; \
fi; \
done
The first package we install on the final system is the "Package Users" package. This sets up a framework that we can use so that every file in the final system clearly shows what package it belongs to.
13.7. package-users
Name |
Package-Users Support Files and Scripts |
---|---|
Version |
0.7.8 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
One of the main things that distinguishes different families of
GNU/Linux distributions is the approach they take toward managing
software packages. Redhat systems typically use the yum
and rpm
tools to download and install, upgrade, or remove rpm
files, which are
pre-compiled binary packages structured in a particular way. Debian
systems, and derivatives like Ubuntu, similarly use the apt
and dpkg
programs to manipulate deb
files, which are pre-compiled binary
packages structured in a different way.
Gentoo systems, and derivatives like Funtoo, don’t use binary packages
at all; they use a program called emerge
that uses instruction files
in a specific format to download, compile, and install packages from
source code. emerge
maintains a database of all the packages that have
been installed this way, along with lists of all the files that were
installed as part of those packages. When a package is removed, emerge
consults that database and systematically removes the files it
originally installed as part of that package.
CBL takes a lighter-weight approach to package management (although it
is a lot more similar to Gentoo than to the systems that use pre-built
binary package files): a separate operating-system user is created for
each package that’s installed, and the package is configured, compiled,
and installed as that user. As a result, it’s obvious which files and
directories were installed as part of a particular package: if you run
ls -l
, the owner and/or group for each file will indicate the package
that owns it.
Matthias S. Benkmann, who thought up this approach and published it as a
"hint" for Linux From Scratch, named it "package users" and developed a
number of utility scripts and configuration files to streamline the use
of package users in a GNU/Linux system. CBL includes those utilities
and related files, substantially customized, extended, and adapted so
they fit in well as a part of the Little Blue Linux system, as the
package-users
package. That’s what we’re going to set up now, and our
goal is for the final result of this installation to look as though
package-users was installed using the package-users scheme itself. If
you want to know more about the package-users approach to system
configuration, you can read the document package-users-manual.txt
found in the package-users package; it’s installed in
/usr/share/doc/package-users
on CBL systems. You can also, if you’re
interested, find the original LFS "hint" version of the document at:
http://www.linuxfromscratch.org/hints/downloads/files/more_control_and_pkg_man.txt
Nothing in the package-users package per se needs to be built; all we’re doing here is installing it.
(none)
(none)
(none)
The package users scheme relies on an install
group, which all package
users will belong to (including the package-users
user that will own
this stuff). The convention used in CBL is that the install
group has
GID 9999, and package users and groups have UIDs and GIDs starting with
10000. That leaves user and group IDs from 1000 to 9998 for normal
system users.
echo "install:x:9999:package-users" >> /etc/group
echo "package-users:x:10000:" >> /etc/group
echo "package-users:x:10000:10000:Package users \
framework:/usr/src/package-users:/bin/bash" >> /etc/passwd
Now we can install all the helper scripts and the template for package user home directories and stuff like that. Most of that can just be copied in from the tarfiles contents.
chown -R 10000:10000 etc usr
cp -a etc usr /
install -o 10000 -g 10000 -m 755 -d /etc/pkgusr/skel-package/src
install -o 10000 -g 10000 -m 755 -d /etc/pkgusr/skel-package/patches
install -o 10000 -g 10000 -m 755 -d /etc/pkgusr/skel-package/logs
ln -s /etc/pkgusr/bash_profile /etc/pkgusr/skel-package/.bash_profile
ln -s /etc/pkgusr/bashrc /etc/pkgusr/skel-package/.bashrc
chown -R 10000:10000 /etc/pkgusr/skel-package
mkdir -p /usr/share/doc/package-users
cp -a doc/* /usr/share/doc/package-users
chown -R 10000:10000 /usr/share/doc/package-users
The package-users startup scripts provide a convenient location to add
compilation flags that should be used for all packages — this could be
used, for example, to enable some of GCC’s optimizations (like -O2
or
-Os
) globally for all packages that don’t have a specific CFLAGS
or
CXXFLAGS
environment variable specified in their blueprint.
CBL assumes that you want to use the same set of optimization flags for C and C++ programs. If that’s not the case, this is where you should make an adjustment!
echo "export CFLAGS='-O2 -fno-omit-frame-pointer -march=native'" >> \
/etc/pkgusr/bash_profile
echo "export CXXFLAGS='-O2 -fno-omit-frame-pointer -march=native'" >> \
/etc/pkgusr/bash_profile
Similarly, MAKEFLAGS
should be set based on the corresponding
parameter.
echo "export MAKEFLAGS='-j8'" >> \
/etc/pkgusr/bash_profile
That’s it, the package users files are all installed! Pretty simple, right? But there are a few more things left to do to make it look as though the package users files themselves were installed as a package user.
cp -a /etc/pkgusr/skel-package /usr/src/package-users
chown -R 10000:10000 /usr/src/package-users
echo $(basename $(pwd) | sed 's@users-@users @') > \
/usr/src/package-users/.project
chown -R 10000:10000 /usr/src/package-users/.project
cp -a /home/lbl/materials/package-users*tar* /usr/src/package-users/src
When we copied everything to /
, it changed the ownership of some
standard system directories, so let’s change them back. This is also a
good time to set all the install directories to have the correct group
and mode, and put a list of local filesystems (used by some of the
package-user scripts to scan for files owned by a particular package)
into the /etc/pkgusr
directory.
chown 0:0 /etc /usr{/bin,/lib,/sbin,}
set_install_dirs
setup_scan_filesystems
Typically, package users are manipulated from the root
account. It’s
convenient for root to have bash completion set up to use user accounts
for some commands. (By doing this, you can do things like type pinky g
and then hit tab a couple of times and the shell will show you all the
users that start with a g
.)
echo 'complete -o default -o nospace -A user su pinky sudo' \
>> /root/.bash_profile
In systems that follow the CBL process, configuration files are managed
in a version-control system. This allows system configuration changes to
be tracked and audited, and makes it very easy to revert problematic
changes. To distinguish configuration repository changes that are made
automatically by litbuild-generated scripts from manual changes, these
are always committed as Little Blue Linux <default@localhost>
.
cfgrepo-init 'Little Blue Linux' default@localhost
The way the configuration file repository is set up by default, both
git commit
and git as-default
will use the same authorship
information (name and email address) for commits. All of the
configuration repository commits from litbuild-generated blueprints will
be added using as-default
; to make it easy to distinguish those from
manual modifications, we can modify the author information used by git
commit
, and add another as-lbl
alias for convenience.
cfggit config --global user.name 'A Little Blue User'
cfggit config --global user.email 'lbl@lblinux.org'
cfggit config --global alias.as-lbl \
"commit --author='A Little Blue User \
<lbl@lblinux.org>'"
The configuration repository can be populated either while individual packages are set up and installed, or after everything is built — really, it doesn’t matter, as long as files are added to the repository before they are manually modified.
This is done using the cfggit add
and cfggit commit
commands
(optionally using one of the as-$user
aliases rather than commit
).
Typically, blueprints should include configuration-files
directives
so this will be done automatically.
We don’t want the file listing for package users to include anything
under the build directory, but since we chown`ed it earlier, it will.
We can `chown
it back so that doesn’t happen.
chown -R 0:0 .
list_package package-users >> /usr/src/package-users/.project
And as a finishing touch, we can create the file that litbuild will use in the future to determine whether the package is already installed.
echo $(basename $(pwd) | sed 's@package-users-@@') > \
/usr/src/package-users/.default
The rest of the build will be done in a separate section, using the package users framework we just installed.
14. Target Side Of The CBL Process, With Package-Users
Once the package-users framework is installed, we can start building the components that make up the final system.
15. A Word About Tests
Since the programs and libraries we are building here will comprise the final CBL system, it is highly desirable to run all of the automated tests that are provided as a part of the package distributions. Unfortunately, the CBL system is not complete enough, at this stage, to run all the test suites reliably. Some test suites fail; for these, the standard practice in CBL is to log the fact that tests failed, but still continue the build process, by modifying the test command to:
make -k check || echo "Exit code $?: continuing anyway"
It’s a good idea to inspect log files for these packages after the build is complete — when the CBL system is booted into the full userspace — and see whether anything looks problematic. If any errors look worrisome, you can re-do the build for those packages and see whether the issue is resolved..
In a few cases, the test suite is even more problematic than simply
failing with an error — for example, it might cause the build to hang
for hours. In those cases, my standard practice is to skip the test
suite entirely during the CBL build process. For these packages, CBL
provides a blueprint called rebuild-untested-packages
that can be run
to rebuild them — and run their tests — once the complete userspace is
available.
16. A Word About Package Names
In CBL, our standard practice is to use the latest stable released
version of everything, and the convention we use for blueprint names is
simply to omit any indication of the version number of packages — the
kernel blueprint is simply linux.txt
, rather than linux-4.13.txt
.
Sometimes that doesn’t work, though: for example, there are significant
compatibility issues between Python 2 and 3, and some Python programs
have not yet been updated to work with Python 3; that means it’s
desirable to have both Python 2 and 3 installed. CBL blueprints for
these old-version packages have a version number suffix in their name:
the blueprint for Python 2 is called python2.txt
.
17. Building the System
With that out of the way, we can start building the target system per se.
Of course, we start with a toolchain: the CBL system toolchain.
17.1. Construction of the final system C library
Finally we are in a position to build the programs and libraries that will make up the actual CBL system!
We start with the toolchain, because it’s the foundation of the system. In particular, we start with the kernel headers and C library: everything (except the kernel) gets linked against the C library, so we need it before anything else, and (as you may recall from when we built the cross-toolchain) the C library needs the kernel headers so that it knows how to invoke system calls.
From this point on, as we build out the final system components, less and less of the scaffolding will actually be needed or used!
We don’t really need to set up the timezone database at this point, but we might as well. (This is done here because the timezone database has historically been distibuted as a part of glibc.)
17.1.1. tzdb
Name |
IANA Time zone database files |
---|---|
Version |
2021a |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Time zones are really complicated, and the rules for how they behave change more often than you might expect — in 2016, for example, there were ten revisions to the time zone rules. The Internet Assigned Numbers Authority, IANA, releases new timezone database files whenever the rules change.
(none)
(none)
(none)
make cc=gcc install
If you want the system’s timezone to be your local time, you can use
tzselect
to identify the correct timezone file and then copy it to
/etc/localtime
. For CBL, the assumption is that the system will be set
to GMT, and individual users can override that by setting the TZ
environment variable in their .bashrc
or .bash_profile
scripts.
17.1.2. linux (final-system-glibc phase)
For an overview of linux, see linux.
Just as in the initial cross-toolchain build, we need to install the kernel header files so that the C library will know how to make system calls properly.
This time, it’s a little bit simpler because we don’t need to specify a target architecture: the target architecture is the native architecture. But other than that, this is just the same as the cross-toolchain kernel headers installation.
make mrproper
make headers_check
(none)
make INSTALL_HDR_PATH=_dest headers_install
cp -rv _dest/include/* /usr/include
rm -rf _dest
17.1.3. glibc (final-system-glibc phase)
For an overview of glibc, see glibc.
Build Directory |
|
---|
- Dependencies
This will be the C library for the final system. It’s kind of a big deal.
As with many toolchain components, glibc should always be built from outside the source tree.
- Build Directory
-
../build-glibc-2
The rest of the configuration is pretty typical for a system glibc. As
usual, for the default CBL process we use options as close to the
defaults as possible. An adjustment you might want to make is to specify
--enable-obsolete-rpc
in the configuration — this will install
obsolete header files related to remote procedure calls, which might
still be used by very old packages. If you don’t know that you will ever
use a program that might need those headers, it’s safe to skip it.
Since we are going to install the timezone tools from the tzdb
package, we don’t need the ones that are provided with glibc; these can
be skipped with --disable-timezone-tools
.
CFLAGS="-g -O2" ${LB_SOURCE_DIR}/configure --prefix=/usr \
--disable-timezone-tools
make
It is really a good idea to run the test suite for glibc — as previously described, it is the most foundational package in the system. Unfortunately, this is sometimes very problematic in the limited scaffolding environment: with release 2.31, and with some other earlier versions, the test suite for glibc leaves some processes still running. The result is that the build hangs after running the glibc tests.
It’s disappointing to skip the tests at this point, but seems like the most pragmatic approach.
Even if we ran the tests at this point, we’d expect to have a couple of
hundred test failures — because the CBL system isn’t on a network while
we’re doing this build; the final system GCC isn’t installed yet and the
libgcc_s.so
library isn’t always found in the /scaffolding
directory
structure; and, generally, the glibc tests are just fragile.
It’s a good idea to rebuild glibc and run its full test suite after the CBL system is complete. If any test failures that occur then seem problematic, it’s a very good idea to follow up on those!
(none)
Glibc has a post-installation sanity check script,
test-installation.pl
, that compiles a simple program, runs ldd
against it, and compares the output to what it expects. Sometimes this
script works, but in at least some cases it decides that it’s a problem
for the dynamic linker to find libgcc_s.so.1
under the /scaffolding
directory. Since that library is a part of GCC, and the final system GCC
isn’t installed yet at this point, there’s no other libgcc_s
for it to
find. There are other issues, as well — for example, it sometimes tries
to link the test program against libraries that aren’t even being built.
Since we’re going to be doing our own sanity check for the final system toolchain, we can simply take the easy work-around of skipping the one glibc provides.
pushd ${LB_SOURCE_DIR}
sed '/test-installation/s@$(PERL)@echo skipping@' -i Makefile
popd
The rest of the installation is pretty typical.
The dynamic linker is configured using a file ld.so.conf
, which is not
created by the glibc installation; we just create an empty one. Also,
the "locales," which are used by glibc for internationalization and
localization support, don’t get installed by default. Strictly speaking,
those aren’t required, but it’s a good idea to have at least your own
locale configured, and having them all doesn’t take up very much extra
time or space.
touch /etc/ld.so.conf
make install
make localedata/install-locales
The glibc package includes a daemon program called nscd
, the "name
service cache daemon," which — as the name implies — caches the
results of certain types of database lookups. It is not necessary, and
not particuarly useful unless there are a lot of users or groups, or
you’re using a distributed authentication database like LDAP, or…
maybe there are other circumstances where it is useful. For CBL, we
simply skip it entirely.
17.1.4. Adjusting the GCC specs (final-system-glibc phase)
For an overview of specs-adjustment, see Adjusting the GCC specs.
- Dependencies
After the final system libc is installed, we can remove the adjusted specs file that causes gcc to use the program interpreter from the scaffolding C library. That will cause its behavior to revert to normal, which is what we want from this point forward.
rm -f $(dirname $(gcc --print-libgcc-file-name))/specs
17.1.5. Set up configuration files for glibc
The GNU C library uses a "Name Service Switch" configuration file to control where to look for name-service information. This blueprint configures it in a way that works fine most of the time.
passwd: files
group: files
shadow: files
hosts: files dns
networks: files
protocols: files
services: files
ethers: files
rpc: files
The dynamic linker from the GNU C library finds shared libraries in a
set of pre-configured locations, plus whatever directories are named in
the ld.so.conf
configuration file. Conventionally, the /usr/local
directory tree has a lib
directory; in CBL, where all packages can be
intentionally installed as a part of the system, the whole /usr/local
structure is of questionable value, but there’s no reason not to set it
up.
/usr/local/lib
After modifying ld.so.conf
, it’s a good idea to run ldconfig
so that
the dynamic linker’s cache contains information about all the libraries
installed on the system.
ldconfig
17.1.5.1. Complete text of files
/etc/ld.so.conf
/usr/local/lib
/etc/nsswitch.conf
passwd: files
group: files
shadow: files
hosts: files dns
networks: files
protocols: files
services: files
ethers: files
rpc: files
17.1.6. Verify that a toolchain works properly (final-system-glibc phase)
For an overview of verify-toolchain, see Verify that a toolchain works properly.
This is exactly identical to the previous toolchain test. It’s copied into this separate phase so that it will be re-run even when a restart database is being used.
#include <stdio.h>
int main(void)
{
printf("Hello, Real Live CBL System World!\n");
return 0;
}
As before, compile it:
gcc /home/lbl/work/build/hello.c -o /home/lbl/work/build/hello
Make sure it isn’t trying to find a dynamic loader in the /scaffolding
directory:
readelf -a /home/lbl/work/build/hello | tee /home/lbl/work/build/program_info
grep 'interpreter: /lib' /home/lbl/work/build/program_info
And run it!
/home/lbl/work/build/hello | grep 'Hello, Real Live CBL System World'
17.1.6.1. Complete text of files
/home/lbl/work/build/hello.c
#include <stdio.h>
int main(void)
{
printf("Hello, Real Live CBL System World!\n");
return 0;
}
17.2. Construction of the final system toolchain
Once the final system glibc is installed and configured, we can build the rest of the toolchain for the final system. This consists of binutils and GCC — which, of course, have some dependencies of their own.
17.2.1. m4 (final-system-toolchain phase)
For an overview of m4, see m4.
This is a standard GNU-build-system package.
./configure --prefix=/usr
make
make check
make install
17.2.2. bison (final-system-toolchain phase)
For an overview of bison, see bison.
Environment |
|
---|
- Environment variable: MAKEFLAGS
-
-j1
Sometimes this build is fragile when used with a lot of parallel make
processes. To avoid any issues, we can just disable make
parallelism.
The flex package depends on bison to build, but bison depends on flex to run its tests. Circular dependencies are always frustrating! This one can be avoided just by skipping the bison tests.
If it’s really important to you to run the tests, rebuild bison after installing flex.
./configure --prefix=/usr
make
(none)
make install
17.2.3. flex
Name |
The Fast Lexical Analyser |
---|---|
Version |
2.6.4 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
Flex is the "Fast Lexical Analyser." It is a tool for generating "scanners", which are programs that recognize lexical patterns in text. This is mostly useful when writing compilers: the part of a compiler that scans source code and turns it into tokens is usually generated using a lexical analyzer like flex.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
In at least some cases (observed with the x86 x32 ABI), the flex test suite fails with a segmentation fault in the cxx_restart test. That might be because of a flaw in the scaffolding C++ compiler, or something related to the ABI, but it’s not important enough to crash the build.
make install
The original lexical analyzer used on UNIX systems was called lex
, and
some programs still try to use it. Flex has a lex emulation mode; CBL
therefore sets up a wrapper script that invokes flex in lex emulation
mode when programs try to run lex.
echo '#!/bin/bash' > /usr/bin/lex
echo 'exec /usr/bin/flex -l "$0"' > /usr/bin/lex
chmod -v 755 /usr/bin/lex
17.2.4. binutils (final-system-toolchain phase)
For an overview of binutils, see binutils.
Build Directory |
|
---|
On some target systems — I’ve experienced this with QEMU-emulated MIPS
virtual machines — the binutils build can crash, or cause QEMU to abort
with a segmentation fault, when building the gold
linker. If this
happens to you, you can add --enable-gold=no
to the configure command
here.
${LB_SOURCE_DIR}/configure --prefix=/usr --enable-shared \
--enable-gold=yes --enable-64-bit-bfd --enable-plugins \
--enable-threads --disable-multilib
As with some of the other binutils builds (but unlike the host-scaffolding build, oddly enough!), some of the warning messages present in GCC 8 and later can present problems when building the binutils. The same makefile tweak we used earlier can be used to ensure that those warnings are not converted to errors.
make configure-host
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=stringop-truncation@' bfd/Makefile
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=stringop-truncation@' gas/Makefile
sed -i -e '/^WARN_CFLAGS/s@$@ -Wno-error=format-overflow@' binutils/Makefile
make tooldir=/usr
By default, binutils installs programs into a multiarch location that includes a target-triplet directory. Since CBL doesn’t use multiarch or multilib, this is not necessary and we override the normal behavior.
make -k check || echo "Exit code $?: continuing anyway"
It would be great if all the binutils tests passed, but I always get a few.
make tooldir=/usr install
17.2.5. gmp (final-system-toolchain phase)
For an overview of gmp, see gmp.
Build Directory |
|
---|
- Dependencies
-
m4.
This GMP will be installed in the location where it will live
permanently on the CBL system, but because we are enabling C++
support (which we need to do to get all the various dependencies and GCC
built) it will actually be set up to link against the standard
C++ library in the /scaffolding
area (via the libtool .la
file and an RPATH ELF segment). We’ll fix that after installing the
final system GCC.
- Build Directory
-
../build-gmp
${LB_SOURCE_DIR}/configure --prefix=/usr --enable-cxx
make
make html
make check
make install
17.2.6. mpfr (final-system-toolchain phase)
For an overview of mpfr, see mpfr.
Even though gmp is installed in the conventional location for system
libraries (/usr
), the scaffolding programs wind up finding the one in
/scaffolding/lib
during this build instead, unless told explicitly
where to look.
./configure --prefix=/usr --with-gmp=/usr
make
Another issue is that the automated test programs are built with an
RPATH
that puts the /scaffolding/lib
directory before the directory
that contains the freshly-built libmpfr.so
. There are several ways to
fix that, but the easiest one is to use LD_PRELOAD
to get the dynamic
linker to do what we want. (If this paragraph confused you, you might
want to review the section A Word About The Dynamic Linker.)
LD_PRELOAD=${LB_SOURCE_DIR}/src/.libs/libmpfr.so make check
make install
17.2.7. mpc (final-system-toolchain phase)
For an overview of mpc, see mpc.
./configure --prefix=/usr --with-gmp=/usr --with-mpfr=/usr
make
make html
make check
make install
make install-html
17.2.8. isl (final-system-toolchain phase)
For an overview of isl, see isl.
./configure --prefix=/usr --with-gmp-prefix=/usr
make
One of the tests fails on some of my builds.
make -k check || echo "Exit code $?: continuing anyway"
make install
GDB is the GNU debugger. ISL includes a Python script that provides pretty-printers for most of the structures it defines when using GDB; if you ever wind up debugging a program that uses ISL, it’s handy to have those pretty-printers available.
For some reason, the build process for ISL doesn’t install it in a location where GDB will find it, but it’s easy to move it there ourselves.
mkdir -pv /usr/share/gdb/auto-load/usr/lib
mv -v /usr/lib/libisl*gdb.py /usr/share/gdb/auto-load/usr/lib || \
echo nevermind
17.2.9. zlib (final-system-toolchain phase)
For an overview of zlib, see zlib.
./configure --prefix=/usr
make
make check
make install
mkdir -pv /usr/share/doc/zlib-1.2.11
cp -rv doc/* /usr/share/doc/zlib-1.2.11
17.2.10. gcc (final-system-toolchain phase)
For an overview of gcc, see gcc.
Build Directory |
|
---|
It’s hard to believe, but this is the final time you’ll need to build GCC! (At least, for this CBL system — unless you want to upgrade it at some point, or add support for additional languages.)
When configuring GCC this time, you may want to enable additional
languages; only C and C++ are necessary, but if you want to
have compilers handy for Objective C, Go, Fortran, or one of the other
languages for which GCC has "front ends," you can add them to the
--enable-languages
list.
Also, if you want to save some time on this build, you can add
--disable-bootstrap
.
${LB_SOURCE_DIR}/configure --prefix=/usr --libexecdir=/usr/lib \
--enable-languages=c,c++ --disable-multilib --with-system-zlib \
--enable-install-libiberty --enable-fix-cortex-a53-835769 --enable-fix-cortex-a53-843419 --with-cpu=cortex-a72.cortex-a53
make
The default stack size is insufficient for running the GCC tests, so
bump it up a bit before running them. As with binutils and glibc, some
GCC tests will fail; at some point, you should review the test results
and see if any of the failures seem problematic. The test_summary
script can generate a summary of the test results.
ulimit -s 32768
make -k check || echo "Exit code $?: continuing anyway"
${LB_SOURCE_DIR}/contrib/test_summary
make install
Several packages expect the C compiler to be available as cc
, so we
can create a symbolic link to help them find it.
ln -sv gcc /usr/bin/cc
And, like isl, GCC provides a pretty-printer that can be used with GDB, so we should put that where GDB will be able to find it.
mkdir -p /usr/share/gdb/auto-load/usr/lib
mv -v /usr/lib/libstdc++*gdb.py /usr/share/gdb/auto-load/usr/lib || \
echo nevermind
Once we have the final system toolchain in place, we can use it to build the rest of the programs and libraries that make up the CBL system.
17.3. Construction of the final system components
The foundation of the final CBL system has been laid. Now we can use that foundation, and a gradually-decreasing set of scaffolding programs and libraries, to build the rest of the target system.
Most of the things being built in this section are more-or-less neccessary parts of a GNU/Linux system. Others are really not vital — for example, lsof can be nice to have around, but many people never use it at all. If you’d like to wind up with a more minimal system than base Little Blue Linux is, you can remove blueprints from this section.
17.3.1. Construction of the skarnet.org suite of programs
This installs a suite of related programs and utilities written by
Laurent Bercot and published on
skarnet.org. The most important of these
are s6, which provides the PID 1 init
program used by Little Blue
Linux, and s6-rc, which provides service management functionality
using s6 as its basis.
s6 and the various other packages related to it provide all the functionality needed to initialize and manage the system state — but without making any policy decisions about how the system ought to be set up, and without usurping functionality that is provided by (and properly belongs in) other programs. In my opinion, it provides all the benefits of other modern init systems (like systemd or upstart) without the drawbacks they bring.
Although each package within the skarnet.org suite of software is independent from the others and provides distinct functionality, in CBL we often refer to the entire set of packages as "s6" just as a convenient shorthand, instead of using more precise terminology like "s6, s6-rc, and s6-linux-init," when talking about the mechanics of the init process. (The name "s6" is itself a kind of shorthand; it stands for "skarnet.org’s small and secure supervision software suite".)
Not all of the skarnet.org packages are absolutely necessary for CBL, but none of them is particularly large, they are easy to build, and they can be very handy! So here we build all the "s6" packages, plus the execline and skalibs packages that they depend on, without worrying too much about whether some components are unnecessary.
The skarnet.org software is all extraordinarily simple and stable code; for CBL, our practice is to apply all the commits on the master branch as a branch-update to the most recent release.
17.3.1.1. skalibs
Name |
Skarnet Libraries |
---|---|
Version |
2.9.3.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Overview
Skalibs is a collection of common routines used by all the skarnet.org software.
skalibs (skarnet-org phase)
./configure --prefix=/usr --datadir=/etc
make
The skarnet.org projects don’t have automated test suites.
(none)
make install
mkdir /usr/share/doc/skalibs
cp -r doc/* /usr/share/doc/skalibs
17.3.1.2. execline
Name |
Execline scripting language |
---|---|
Version |
2.8.0.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
Execline is a non-interactive scripting language designed to be used for
all the service and process management scripts that will be run as part
of the skaware-based (s6, s6-rc, s6-linux-init) system management
programs. It serves essentially the same purpose that bash
does in the
sysvinit-style init scheme, but without providing all of bash’s rich
feature-set.
Providing strictly limited functionality is a huge benefit! Any complexity in the startup process provides room for things to go wrong or be misunderstood.
The easiest way to explain how execline works is to start by describing the way other shell programs work, and then talk about how execline is different. You already have a lot of experience with bash, since that’s the basic command-line shell that is invariably part of all GNU/Linux systems.
When running a shell program like bash, the shell displays a prompt to
indicate that it’s waiting for you to type a command. When you do, the
shell parses the command line, then forks a subprocess — that is, it
uses the fork
system call to create a new child process that is a copy
of itself. The child process then uses the exec
system call, which
replaces the program being run by the current process (the child
process) with a different program specified by the command line.
After forking the child process, the parent process — the shell program process itself — sits around and waits for that child process to terminate. When it does, it displays another prompt.
All this happens every time you run a command! When you run the command
ls -l
to show the contents of a directory, bash:
-
forks, to create a child process that is a copy of itself, and waits for that child process to end;
-
meanwhile, the child process execs into the
/bin/ls
program with the command-line argument-l
; -
the
/bin/ls
program uses functions defined in thels
source code, as well as functions provided by the C standard library, to discover the contents of the current directory, format them into printable strings, displays those strings, and then terminates with an exit code of 0; and -
the original bash process collects that error code, puts it into the
$?
environment variable, and displays a new command prompt.
Scripts work the same way as interactive input. When you run a bash script, the program interpreter doesn’t need to prompt for commands; it runs each command in the script as though you had entered it on the command line. But the way it runs those commands is just the same: for each command, it forks a copy of itself, and then the copy execs into the command line from the script. (You can type the full text of a shell script at a shell prompt, if you want, and bash will behave just as though you had run that script.)
That brings us to execline. Execline is similar to bash, but it doesn’t fork. It just execs.
This is different enough from how other scripting languages work that it can be hard to wrap your mind around! (I had to bounce around on the documentation pages for execline and the various s6 sub-projects for a couple of days before I got it.)
The execline program per se reads the entire script, parses it into one long command line, and then execs into it. Hence the name "execline:" it execs a command line.
(For historical reasons, the execline program actually gets installed
with the name execlineb
; in CBL, this gets symlinked to execline
so
you can use either command name, and the execline scripts set up by CBL
generally use the command name execline
.)
The vast majority of the execline "language" is outside of the execline
program: the language is made up of dozens of tiny programs that are
intended to be used as components in a single long command line. In most
cases, these programs do something with one or more of their arguments,
and then exec into the rest of the command line — a technique
sometimes called "chain loading." This is the single most important
thing to understand about execline and the various s6
packages!
Almost everything in those packages is a tiny program intended to be
used this way, as part of a single long command line. You can think of
most of the s6
packages as being extensions to the execline
"language."
Unlike bash, execline doesn’t have any built-in commands! Take, for example, this simple execline script:
#!/bin/execline cd /usr/local/share/doc ls -l
The first line indicates that the script should be run by the program
/bin/execline
. Execline ignores the comment line and parses the rest
of the script into a command line:
cd /usr/local/share/doc ls -l
Then it execs into that command line:
-
it runs the program
cd
, which is one of the programs that make up the execline language, giving it the arguments/usr/local/share/doc ls -l
. -
The
cd
program changes the current working directory of the process to the directory named in its first argument, in this case/usr/local/share/doc
, and then execs into the rest of its command line. In this case that means it runs the programls
with the argument-l
. -
The
ls
program parses its own argument,-l
, and prints out a directory listing in the "long" format.
It’s common for an execline script to consist of a bunch of invocations of programs that are part of the execline language (or an extension to it), with one external program at the end. There will be more about this later on!
One of the programs in s6-portable-utils, s6-echo
, works a lot
like the echo
shell command; it simply writes all its arguments to its
standard output stream. If you have an execline script that is not
behaving the way you’d like it to, a handy trick I’ve found is to just
add s6-echo
in front of the problem area; when s6-echo
is
encountered, it prints out the rest of the parsed script and then
terminates, so you can see whether the rest of the script looks like you
expect it to. If you want to see what environment variables are set at
that point, you can additionally add an invocation of s6-env
before
s6-echo
.
execline (skarnet-org phase)
./configure --prefix=/usr --exec-prefix=/ --enable-shared \
--disable-allstatic
make
(none)
make install
mkdir /usr/share/doc/execline
cp -r doc/* /usr/share/doc/execline
The main execline program is installed, as mentioned above, as
execlineb
. This is for historical reasons — the first released
version of execline had no support for brace-delimited blocks; when the
need for blocks became obvious, the author added that support in a new
version of execline
he called execlineb
for "execline with blocks."
Eventually, the original execline
program was deprecated and removed,
so execlineb
is the only version still around in current versions of
the package. I like just using execline
in the shebang line of my
scripts, so I create a symbolic link for that purpose.
if [ ! -e /bin/execline ]; \
then \
ln -s execlineb /bin/execline; \
fi
17.3.1.3. s6-dns
Name |
s6 DNS client programs and libraries |
---|---|
Version |
2.3.5.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
This is a DNS client library and collection of related programs. These programs allow you to make DNS queries simply and efficiently.
s6-dns (skarnet-org phase)
./configure --prefix=/usr --exec-prefix=/ --enable-shared \
--disable-allstatic
make
(none)
make install
mkdir /usr/share/doc/s6-dns
cp -r doc/* /usr/share/doc/s6-dns
17.3.1.4. s6
Name |
skarnet.org small and secure supervision software suite |
---|---|
Version |
2.9.2.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
s6 is the process supervision component of the skarnet.org suite of
packages. The purpose of s6 is to make sure that processes that are
supposed to be running are actually running; if a supervised process
ends, an s6-supervise
process will restart it. All of the
s6-supervise
processes are spawned and managed by a top-level
s6-svscan
process.
The way this works is: each s6-svscan
process monitors a scan
directory, which contains any number of service directories. The
s6-svscan
process will create an s6-supervise
process for each of
the service directories in the scan directory.
Service directories have various optional and mandatory files and
subdirectories in them; the most significant is an executable named
run
, which is typically an execline script that runs the process that
is supposed to be managed. The primary action of the s6-supervise
process is to spawn a subprocess that executes the run
script in its
service directory. After that, it just hangs out and waits. If the
process ends, s6-supervise
will optionally take clean-up actions in a
finish
script if one exists, and then spawns another subprocess to
re-execute the run
script.
If you want to send a signal to the supervised process — for example,
many daemon programs will reload their configuration files if they
receive a HUP
signal — you can use the s6-svc
program, which can
tell a s6-supervise
process to send signals to the process it’s
supervising. Similarly, if you want to control the state of the
s6-svscan
process, you can use the s6-svscanctl
program.
Aside from those few primary programs — s6-svscan
, s6-supervise
,
s6-svscanctl
, and s6-svc
— s6 consists of a bunch of programs that
extend the execline language and use the same "chain loading" paradigm.
The execline extensions in the s6 package are useful for managing
processes; for example, s6 contains a program s6-setuidgid
that simply
sets the effective user ID and group ID of the process (and then execs
into the command line formed by the rest of its arguments).
s6 (skarnet-org phase)
./configure --prefix=/usr --exec-prefix=/ --enable-shared \
--disable-allstatic
make
(none)
make install
mkdir /usr/share/doc/s6
cp -r doc/* /usr/share/doc/s6
17.3.1.5. s6-linux-init
Name |
s6 linux init tools |
---|---|
Version |
1.0.6.3 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
s6-linux-init is the last component in the s6 init
system: it provides
a program (s6-linux-init
) that is intended to be run as the initial
PID 1. It does a few basic system initialization tasks and then sets up
s6 as a process supervisor and s6-rc as a service manager.
This package also provides a handful of other programs that provide
a command-line user interface similar to the legacy sysvinit package:
for example, it includes a s6-linux-init-telinit
program (for which a
wrapper telinit
is provided), which allows the selection of any of a
number of system states called "runlevels," and other programs that
facilitate shutting down or rebooting the system using the same commands
that have worked for years with sysvinit.
Since the CBL process builds a system from the ground up using no legacy policy decisions related to the init system, the sysvinit compatibility layer is basically irrelevant here, but as with other skarnet components, having it around adds only a tiny bit of clutter to the system. If even that seems objectionable, it’s not difficult to remove the unnecessary components! That’s left as an exercise for the interested system administrator.
We’re not actually going to set up the init system here; that will be done later, at Configure the system initialization framework.
s6-linux-init (skarnet-org phase)
./configure --enable-shared --disable-allstatic
make
(none)
make install
mkdir /usr/share/doc/s6-linux-init
cp -r doc/* /usr/share/doc/s6-linux-init
17.3.1.6. s6-linux-utils
Name |
s6 linux utilities |
---|---|
Version |
2.5.1.5 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
This is a collection of small utilities that work on Linux systems. Like
the s6-portable-utils programs, Many of them are smaller or simpler
versions of utilites available in other packages, and you can certainly
use those instead. Some of these programs can be very helpful in
combination with other s6-related programs, though, like s6-logwatch
,
which is more effective than tail -f
for log directories managed by
s6-log
.
The s6-linux-utils are required for CBL because s6-linux-init depends on them.
All of the programs in this package are prefixed with s6-
, so they
don’t collide with programs installed by different packages.
s6-linux-utils (skarnet-org phase)
./configure --prefix=/usr --exec-prefix=/ --enable-shared \
--disable-allstatic
make
(none)
make install
mkdir /usr/share/doc/s6-linux-utils
cp -r doc/* /usr/share/doc/s6-linux-utils
17.3.1.7. libressl (final-system-components phase)
For an overview of libressl, see libressl.
CBL includes multiple TLS libraries. LibreSSL, as one of the primary
ones that is commonly used, can be installed directly under /usr
, so
its headers and shared libraries will be found easily by other packages.
If a package really needs to be linked against OpenSSL instead, that can
be done by specifying appropriate directives when configuring them.
./configure --prefix=/usr --enable-nc \
--with-openssldir=/etc/ssl --enable-extratests
make
Some of the tests fail on the partial system.
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.1.8. s6-networking
Name |
s6 networking utilities |
---|---|
Version |
2.4.1.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
s6-networking is a collection of small networking utilities for Unix systems. Among other things, this package includes command-line client and server management, clock synchronization, and similar programs.
s6-networking (skarnet-org phase)
./configure --prefix=/usr --exec-prefix=/ --enable-shared \
--disable-allstatic --enable-ssl=libressl
make
(none)
make install
mkdir /usr/share/doc/s6-networking
cp -r doc/* /usr/share/doc/s6-networking
17.3.1.9. s6-portable-utils
Name |
s6 portable utilities |
---|---|
Version |
2.2.3.2 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
This is a collection of small utilities that work on most Unix systems;
they are not GNU- or Linux-specific. Many of them are smaller or simpler
versions of utilites available in other packages, like cat
or chmod
or chown
; for these, you will probably wind up using the version from
GNU coreutils or whatever other package provides them. But some of the
portable utilities, like s6-update-symlinks
, provide functionality
that is not commonly available elsewhere.
The s6-portable-utils are required for CBL because s6-linux-init depends on them.
All of the programs in this package are prefixed with s6-
, so they
don’t collide with programs installed by different packages.
s6-portable-utils (skarnet-org phase)
./configure --prefix=/usr --exec-prefix=/ --enable-shared \
--disable-allstatic
make
(none)
make install
mkdir /usr/share/doc/s6-portable-utils
cp -r doc/* /usr/share/doc/s6-portable-utils
17.3.1.10. s6-rc
Name |
s6-rc service manager |
---|---|
Version |
0.5.2.2 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Patches |
|
Dependencies |
Overview
s6-rc is a service manager, intended to serve the same fundamental purpose as other init systems: it is a suite of programs that can start and stop long-running daemon processes and can run one-time initialization scripts, in the proper order according to a dependency hierarchy. s6-rc ensures that all long-running daemon processes are correctly supervised by s6, and it runs one-time initialization scripts in a controlled (and therefore predictable) environment.
You can read more about how services can be defined in the Construct the s6-rc service database section (or in the s6-rc documentation), and you can read about how it fits into the CBL process in the Configure the system initialization framework section.
s6-rc (skarnet-org phase)
./configure --prefix=/usr --exec-prefix=/ --enable-shared \
--disable-allstatic
make
(none)
make install
mkdir /usr/share/doc/s6-rc
cp -r doc/* /usr/share/doc/s6-rc
17.3.2. attr (final-system-components phase)
For an overview of attr, see attr.
./configure --prefix=/usr --sysconfdir=/etc
make
make -k check || echo "exit code $?, proceeding anyway"
make install
17.3.3. acl
Name |
POSIX Access Control Lists utilities |
---|---|
Version |
2.3.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
Linux — and POSIX systems in general — have always supported a basic
access-control mechanism that governs who may access which files and
directories, and what type of access is permitted; that’s the mode,
which can be modified with chmod
, and the owner and group for the
file, which can be modified with chown
and chgrp
.
Sometimes you might want to have more fine-grained access controls, when the simple owner/group/world permission model isn’t enough to do what you want. This is supported using "access control lists," which are implemented through extended attributes in the filesystem. The ACL package allows access control lists to be viewed and administered.
The acl test suite can only be run after the GNU coreutils package has
been built and linked with the libraries provided by this package. If
you want to run the tests, you can do that with make tests
.
./configure --prefix=/usr
make
(none)
make install
17.3.4. libffi (final-system-components phase)
For an overview of libffi, see libffi.
In the minimal CBL system, the tests all fail.
./configure --prefix=/usr
make
(none)
Even though the configure script for libffi says that the default
location for header files is PREFIX/include
, the installation targets
and pkg-config file always install header files to
${libdir}/libffi-${version}/include
. Some packages, like LLVM, don’t
use pkg-config to find header files, so we can create symbolic links in
the canonical system location.
make install
find /usr/lib/libffi*/include -type f | while read filename; \
do \
ln -f -s $filename /usr/include; \
done
17.3.5. libyaml
Name |
LibYAML |
---|---|
Version |
0.2.5 |
Project URL |
|
SCM URL |
|
Download URL |
YAML — the name stands for "YAML Ain’t Markup Language" — is a human-readable data serialization format. You can read all about it on https://yaml.org/. (Perhaps worthy of note, directives in litbuild blueprints are written in a very YAML-ish syntax.)
LibYAML is a library, originally written as part of the PyYAML project, for parsing YAML into data structures and emitting YAML from data structures. It’s a convenient serialization format for cases where someone might need to look at the serialized data and understand what it is without any arcane hackery or tools.
The Ruby standard library includes a YAML implementation that relies on LibYAML.
./configure --prefix=/usr
make
make check
make install
17.3.6. ncurses (final-system-components phase)
For an overview of ncurses, see ncurses.
The configuration here is much the same as for the scaffolding version
of the library. We don’t need to specify --with-build-cc
, since the
system compiler is the one that the build process will find; we enable
wide character support; and we tell the build process to generate and
install .pc
files that are used by pkg-config.
./configure --prefix=/usr --with-shared \
--enable-overwrite --without-debug --without-ada --enable-widec \
--with-pkg-config --enable-pc-files --enable-mixed-case \
--with-cxx-shared
make
There are automated tests for ncurses, but they only work against a
fully installed ncurses package. If you wish to run the tests, look at
the file test/README
in the source distribution.
(none)
make install
When wide character (UTF-8) support is enabled in ncurses, its libraries
are installed with filenames that end in "w" — like libncursesw.so
,
rather than libncurses.so
. The expectation is that, if you have
programs that don’t work right with wide character support, you’ll
install a separate set of ncurses libraries with wide characters
disabled.
In most cases, this isn’t necessary, because programs that aren’t written specifically to use wide-character support generally work fine when linked against the wide-character libraries instead. This is the case for the base CBL programs, and we assume it’s the case with other programs that will be installed into the CBL system. (If you run across a problem, you can simply build and an additional version of ncurses with wide-character support disabled.)
Accordingly, we set up linker scripts and symbolic links so that the wide-character libraries and config program will be found by programs that are looking for the old-style ones.
cd /usr/lib
for LIB in menu form ncurses ncurses++ panel; \
do \
echo "INPUT(-l${LIB}w)" > lib${LIB}.so; \
ln -sfv lib${LIB}w.a lib${LIB}.a; \
done
ln -svf ncursesw6-config /usr/bin/ncurses6-config
17.3.7. readline
Name |
GNU Readline Library |
---|---|
Version |
8.1 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
GNU Readline is a library that applications can use if they wish to edit
command lines using the same keystroke commands that work in the common
text editors Emacs or vi. This is handy for command shells like bash
,
and for the read-evaluate-print-loop interpreters that are generally
provided by scripting languages like ruby.
Like bash, patches for readline are found in a separate directory and need to be applied separately. Also like bash, the patches have already been applied to the source archive on repo.freesa.org.
./configure --prefix=/usr
make SHLIB_LIBS=-lncurses
(none)
make SHLIB_LIBS=-lncurses install
17.3.8. ruby (final-system-components phase)
For an overview of ruby, see ruby.
./configure --prefix=/usr --sysconfdir=/etc --enable-shared \
--enable-debug-env
make
Ruby has a lot of tests that rely on the network being available. These all time out after a couple of minutes (each) but there are enough of them that it takes an absurd amount of time for the test target to run… and with enough failures that it would make sense to re-run the tests once the system is complete.
(none)
17.3.8.1. About Ruby Packages
Ruby programs are almost universally packaged in a format called "ruby
gems," which can be created and managed using the gem
program provided
as a part of standard Ruby installations. Gem files are not especially
complicated — they are tar files containing a compressed data
tarfile
with the gem contents, a compressed metadata.yaml
file with a bunch of
metadata about the gem, and a compressed checksums
file with SHA256
and SHA512 checksums of the data
and metadata.yaml
files.
The gem
program allows you to download and install a gem, along with
any other gems it depends on, from an external repository — by default,
it uses the repository https://rubgems.org/
, but you can change that
if you wish.
There are several different ways you can set up ruby gems on a LB Linux system.
The approach taken by the basic CBL process is to treat them just like
other software packages: you start with a tar file containing the
complete project source code, build it — which generally just means
packaging it as a gem, with the gem build
command, specifying a
gemspec file as a command-line argument — and install it, again
typically using the gem install
command.
This is a fair amount of work, since you have to have a blueprint for
every gem package, find and download the source tarfiles for them, and
so on. An alternative approach that is substantially less work is to use
the gem
program to download and install gems and their dependencies.
If you do this, you won’t have each gem installed as a separate package
user, of course. You could have a single ruby-gems
package user to own
all gem files; or you could create a package user for each gem that you
actually care about, and have that package user also coincidentally
own all the other gem dependencies that are needed by it. Or you could
do something else entirely! It’s your system.
make install
17.3.9. asciidoctor
Name |
Asciidoctor |
---|---|
Version |
2.0.16 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
AsciiDoc is a documentation format based on normal ASCII text files with a simple set of formatting conventions; files written in the AsciiDoc format can be transformed into a variety of document formats.
It is also the format used for the narrative sections of litbuild blueprints; when litbuild produces a human-readable document to tell the story of how to construct a package, it produces an AsciiDoc output file that can then be further transformed into whatever final output form is desired.
The documentation for some packages that are part of CBL — notably, the git version control system — is also written in AsciiDoc.
There are at least two programs that can be used to process AsciiDoc
files: the original one is a eponymous python-language package,
available at https://asciidoc.org/
; there is also a more recent
ruby-language implementation called Asciidoctor. For CBL we prefer
Asciidoctor, primarily because it’s still under active development: the
most recent release of AsciiDoc, as of this writing, was made in
November of 2013, while the current version of Asciidoctor was released
in April 2019.
Since Asciidoctor is distributed as a ruby gem, it doesn’t really have
to be built from a source package — you can simply gem install
asciidoctor
and Bob’s your uncle. Here, though, we’re still going to
start with a source tarfile, because that’s how we roll. (Eventually,
we’ll also run the automated test suite as part of the build process,
because that’s also how we roll; but that requires a large number of
additional gems to be installed to fulfill dependencies, and I don’t
want to write blueprints for them all right now.)
(none)
Rubygems — which is provided as part of the ruby package — provides a
facility to construct a gem from a source package, using a gemspec
file that tells it what should be packaged that way.
gem build asciidoctor.gemspec
As mentioned earlier, there are automated tests for Asciidoctor; unfortunately, they can only be run if a bunch of additional ruby packages are installed, and I don’t have energy or enthusiasm enough to write blueprints for them at the moment.
echo 'Skipping tests (would be rake test:all)'
gem install -l asciidoctor*gem
17.3.10. autoconf
Name |
GNU Autoconf |
---|---|
Version |
2.71 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Autoconf is part of the GNU build system, which strives to make it easy
to write programs that will run on a wide variety of computer systems by
auto-detecting the value of things that vary from system to system (like
the number of bits in an int
variable, or pointer, for example).
If you’d like to know more about the GNU build system, you can watch a
video tutorial at https://www.dwheeler.com/autotools/
or read the free
autotools ebook you can find at
http://freesoftwaremagazine.com/books/
.
Some of the automated tests for autoconf require automake to be installed; you might want to re-run the tests after installing automake.
At least one test — the test for autoscan — will fail at this point. As with other packages, review the test results after the build to see if anything seems worrisome.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.11. automake
Name |
GNU Automake |
---|---|
Version |
1.16.4 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Automake is part of the GNU build system, like autoconf.
The Automake test suite is expansive and surprisingly unreliable — a
lot of the lex
tests, fail, for example. As with other packages,
review the test results after the build to see if anything seems
especially problematic.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.12. berkeley-db
Name |
Oracle Berkeley DB |
---|---|
Version |
18.1.40 |
Project URL |
http://www.oracle.com/technology/products/berkeley-db/index.html |
SCM URL |
(unknown) |
Download URL |
(unknown) |
Dependencies |
|
Build Directory |
|
Berkeley DB is a library that provides lightweight database functionality. It is used by a wide variety of programs that need database functionality — reliable and fast transactions, for example — but don’t want to use an external RDBMS like PostgreSQL or MySQL.
Downloading the Berkeley DB source release from Oracle currently
requires you to sign up for an Oracle account. It’s a little
frustrating. The tarball also has the non-conventional name db-
and
must be repackaged for use in CBL. You can fetch the tarfile from
repo.freesa.org instead, if you wish.
This package supports TLS connections if OpenSSL or LibreSSL are avialable.
- Dependencies
- Build Directory
-
build_unix
Berkeley DB has gone through several versions. Programs that are designed to use version 1.85 of Berkeley DB can’t normally use modern versions; there’s a configure flag to enable support for that database file version. You can enable it if you think you’ll need any program that needs it.
../dist/configure --prefix=/usr --disable-compat185 --enable-cxx
make
To run the automated test suite for Berkeley DB, you need to have TCL
installed, and you need to configure with --enable-tcl
and
--enable-test
. After building the database, run tclsh
to run the TCL
shell; then run source ../test/test.tcl
, run_parallel 5 run_std
, and
exit
from the TCL shell. The tests run for several hours.
The documentation suggests that, after running the tests, it’s a good
idea to rebuild the package without the --enable-test
switch.
Given the arduous nature of the test suite, CBL skips it.
(none)
Two of the documentation files that the Makefile tries to install don’t seem to be built. If you care about Berkeley DB enough to look into what’s going on here, and figure out the right thing to do about it, please let us know. All we’re doing here is telling the Makefile not to do anything with the files that don’t exist.
sed -i -e 's@bdb-sql@@' Makefile
sed -i -e 's@gsg_db_server@@' Makefile
make docdir=/usr/share/doc/berkeley-db install
17.3.13. gperf
Name |
GNU perfect hash function generator |
---|---|
Version |
3.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
GNU gperf is a hash function generator. For any set of keywords, it can produce C or C++ source code for a hashing function that will recognize any of the input keywords with a single operation against a lookup table.
This isn’t directly important for most people! As with many components of CBL, gperf is a part of the system only because other important components rely on it.
./configure --prefix=/usr
make
make check
make install
17.3.14. bzip2 (final-system-components phase)
For an overview of bzip2, see bzip2.
(none)
bzip2 installs a library that can be used by other programs to do bzip2 compression and decompression.
As before, we need to force libbz2
to be compiled with -fPIC
.
sed -i 's@^CFLAGS=@CFLAGS=-fPIC @' Makefile
make
make check
The Makefile for bzip2 expects the man
directory to be directly under
/usr
instead of under /usr/share
.
sed -i 's@$(PREFIX)/man@$(PREFIX)/share/man@' Makefile
make PREFIX=/usr install
17.3.15. libcap
Name |
Linux Capabilities utilities |
---|---|
Version |
2.53 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Historically, the privilege model for Unix systems was pretty simple:
everyone had a separate account with only basic permissions, and if you
needed to do something that was outside those basic permissions (like
modify system programs or processes, or mount filesystems, or whatever),
you would use su
or sudo
to elevate your privileges to the
super-user account, conventionally called "root". The super-user account
has no restrictions whatsoever.
This isn’t always the best model to use! It leads to situations where sometimes processes are run as root, with full permissions to do anything at all to the system, when the only non-standard permission they need to do is bind to a TCP port lower than 1024, or something of that sort. (It’s common for web servers to do whatever they need root privileges for first, and then drop those privileges before they do anything else, precisely because it’s a bad idea for processes to run with superuser privilege.)
That’s why the "capabilities" feature was added to Linux. If a program
needs to be able to bind to TCP port 300, it can be given the
CAP_NET_BIND_SERVICE
capability; it will be able to bind to low ports,
but it won’t be able to destroy the root filesystem.
libcap is a package for setting and viewing these capabilities.
The libcap package doesn’t use the GNU build system, so there isn’t a configure step.
(none)
make
make test
The install target for libcap runs a command that requires root
privilege, unless that behavior is overridden by setting the
RAISE_SETFCAP
variable. We’ll try to run that command as root,
post-installation.
make RAISE_SETFCAP=no install
root
) commands:/sbin/setcap cap_setfcap=i /sbin/setcap || echo nevermind
17.3.16. coreutils (final-system-components phase)
For an overview of coreutils, see coreutils.
Many of the programs in coreutils can support extended attributes and
file capabilities, if the system has support for these. For example,
cp
will be able to copy extended attribute metadata as well as file
data if the support library is available when the coreutils are built.
This is worthwhile, so we specify dependencies that aren’t strictly
necessary.
By default, the hostname
program isn’t installed, and kill
and
uptime
are. We want to override this behavior; CBL uses the kill
and uptime
programs from the util-linux and procps packages,
respectively, and we want hostname
.
./configure --prefix=/usr \
--enable-no-install-program=kill,uptime \
--enable-install-program=hostname
make
The coreutils package has tests that can only be run as root, as well as
tests that can be run by normal users. If you want to run the as-root
tests, you can do that with make NON_ROOT_USERNAME=coreutils
check-root
.
Some coreutils tests don’t reliably pass, so we do the usual thing and
throw in an || echo
to force the pipeline to succeed and allow the
build process to continue; as always, it’s a good idea to review the
test results manually, though.
make RUN_EXPENSIVE_TESTS=yes -k check || \
echo "Exit code $?: continuing anyway"
make install
Some programs have test suites (or other build machinery) that assume that programs are in specific directories. Move them to the expected location.
mv /usr/bin/cat /bin
mv /usr/bin/pwd /bin
mv /usr/bin/stty /bin
17.3.17. iproute2
Name |
IP and network traffic control utilities |
---|---|
Version |
5.13.0 |
Project URL |
|
SCM URL |
git://git.kernel.org/pub/scm/network/iproute2/iproute2.git |
Download URL |
https://mirrors.edge.kernel.org/pub/linux/utils/net/iproute2/ |
iproute2 is a collection of utilities that allow TCP/IP networks to be
configured and controlled. Its two main components are the programs
ip
, which controls IPv4 and IPv6 configuration, and the program tc
,
which lets you configure traffic control.
The ip
program replaces a lot of other programs — when looking at
documents and howtos that talk about how to set up networking on
GNU/Linux systems, you may find references to programs like ifconfig
and route
. The functionality provided by those programs has been
subsumed into iproute2.
Similarly:
-
the program
bridge
from this package provides a superset of the functionality implemented by thebrctl
program from the bridge-utils package; and -
the program
ss
from this package can be used in place of the programnetstat
from thenet-tools
package; it provides more TCP and state information thannetstat
does.
iproute2 does not use the GNU build system, so there is no configure script.
(none)
make
This package has no automated tests.
(none)
make install
17.3.18. perl (final-system-components phase)
For an overview of perl, see perl.
Environment |
|
---|
The CBL system has zlib and bzip2 already, so the copies bundled with perl are unnecessary. Of course, we have to make sure they are already present on the final system.
As with the target-scaffolding perl, this Perl build wants to find a
pwd
program in the /bin
directory.[10] Rather than creating a symbolic link to the scaffolding
pwd
again, we can just force the final system coreutils
to be built
first.
- Dependencies
For some reason, perl uses the loopback network device, so it needs to
be enabled. It also expects there to be an /etc/hosts
file, so we
might as well provide one.
- Dependencies
root
) commands:ip link set lo up
echo "127.0.0.1 localhost" > /etc/hosts
If something has gone wrong previously and the CBL build has been restarted, it’s possible that there’s still a symbolic link to the scaffolding perl from the location where the perl binary will be installed. If so, we need to get rid of it before building and installing the final system perl.
root
) commands:if test -L /usr/bin/perl; then rm -f /usr/bin/perl; fi
The configure.gnu
script is used again here. The "man" directives are
a way of telling Perl to build and install man pages even though groff
is not yet available, and the "pager" directive tells Perl to use the
less
pager instead of more
even though, at this point, it’s also not
available. "useshrplib" tells Perl to build a shared libperl
library,
rather than statically linking its functions into the perl
binaries.
./configure.gnu --prefix=/usr -Dvendorprefix=/usr \
-Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 \
-Dpager="/bin/less -isR" -Dusethreads -Duseshrplib
make
Also, some of the tests fail, so we once again continue regardless.
make test || echo "Exit code $?: continuing anyway"
A bunch of scripts that were built and installed before the final system perl is set up have the scaffolding perl path baked into them. Now that we have the real perl available, we can modify those scripts so they know where to find it.
root
) commands:pushd /usr/bin
grep -l /scaffolding/bin/perl * | while read FILE; \
do \
sed -i -e 's@/scaffolding/bin/perl@/usr/bin/perl@g' $FILE; \
done
popd
make install
17.3.19. expat (final-system-components phase)
For an overview of expat, see expat.
The tests don’t pass reliably — for Intel-architecture builds, it seems okay, but ARM fails entirely.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
By default, the documentation doesn’t get installed.
make install
mkdir -p /usr/share/doc/expat
install -v -m644 doc/*.{html,png,css} /usr/share/doc/expat
17.3.20. pth (final-system-components phase)
For an overview of pth, see pth.
./configure --prefix=/usr --mandir=/usr/share/man
make -j1
make -j1 test
make -j1 install
17.3.21. python (final-system-components phase)
For an overview of python, see python.
17.3.22. About Python Packages
The common practice for python packages installed from source is to use
a program called setup.py
, which I think ties into the setuptools
package-management system (which is bundled with the python package).
There’s no configuration step, and often no test step; the build command
is almost universally python setup.py build
, and the install command
is similarly often python setup.py install
. The installation routine
checks to make sure that the package’s dependencies are also installed,
and downloads them if they are not.
There’s an issue with the package installation routine that makes it
tricky to use with the package-users scheme. Suppose you’re installing a
package called A
, which depends on a package B
. Suppose also that
B
installs a script /usr/bin/b
. When you install B
, this works as
expected: it installs /usr/bin/b
and all is well. However, when you
install A
, the installation process checks to make sure that B
is
available, and then — for some unfathomable reason! — tries to install
exactly the same script /usr/bin/b
.
Since the package user for A
is not allowed to tamper with files owned
by B
, the installation process fails with an error at that point.
Until I think of a better solution, what I do in this case is simply
move /usr/bin/b
out of the way before installing A
, and then move it
back (overwriting the one installed by A
) afterward. Not difficult,
but definitely irritating.
A more common way to install packages is to use the pip
program
mentioned earlier — like the gem
command packaged with ruby, pip
allows packages and their dependencies to be installed from some
upstream source where they’re available in some kind of pre-packaged
form (in the case of python, these are "wheel" or "egg" files). That’s
not the CBL style, but if you want to manage python packages on your
system using pip
I won’t call you a bad person. If you decide to use
pip
for package installation, you should probably create a single
package user (perhaps python-packages
) that will own them all, or just
install them all as the python
package user.
Another alternative is to set up a python virtual environment (aka
"venv") for each specific purpose. A venv appears to be a full
installation of python, with its own set of packages distinct from the
packages installed in the system python (or other venvs), but actually
only has symbolic links to the real system python instead of having a
full copy of everything. To set up a venv, you just have to identify a
directory where it will live, say $HOME/my-venv
, and run python -m
venv $HOME/my-venv
. Then you can activate the venv by sourcing
$HOME/my-venv/bin/activate
— this changes the prompt to indicate what
venv you’re using, and manipulates a few environment variables so that
when you subsequently run pip
or python
or whatever, the one that is
found is the one in the venv. Any packages you install while a venv is
active will be installed into that venv directory structure.
To stop using a venv, you just run deactivate
— this is a shell
function, defined by activate
, that puts the environment back the way
it was originally.
When installing packages, if you don’t want to rely on the canonical
package repository located at pypi.org
(the "Python Package Index"),
you can set up a local python package repository and use pip
to
install packages from it. A helpful tool for doing this is the pip2pi
package. I have not done this so I can’t be more explicit!
For our system Python build, we specify a few additional configure
options, including an "ensurepip" directive that causes the pip
and
setuptools
package-management programs (a version of which is bundled
with the python source distribution) to be installed.
There is a configuration setting --enable-optimizations
that runs a
suite of tests, and then uses profiling data collected during the test
run to improve the performance of the python installation itself. This
is desirable, but the tests hang in the partially-built CBL environment.
You may want to re-build python, with the optimization setting enabled,
once the final system is complete; this is done in
Rebuild the packages whose tests could not be run, so if you decide to run that you’ll get
an optimized Python as a consequence.
./configure --prefix=/usr --with-system-expat --with-system-ffi \
--with-pth --enable-shared --with-lto --with-ensurepip=install \
--with-openssl=/usr
make
There is a make test
Makefile target that runs an extensive suite of
automated tests. Like the tests that run as a result of the optimization
setting, these tests hang indefinitely in the initial partial CBL
environment.
(none)
When installing multiple versions of python into the same prefix (in our
case, /usr
), it’s recommended to designate one of them the primary
version and install it using make install
. Other versions can be
installed with make altinstall
, so that they’re installed as e.g.
python2
and python2.7
but not as python
. Since CBL prefers for the
latest stable version of a package to be primary, python 3 is installed
that way.
make install
Since this will be the primary system python, it should be available
using the program name python
as well as python3
. The same applies
to some other programs.
ln -sf python3 /usr/bin/python
ln -sf pip3 /usr/bin/pip
ln -sf idle3 /usr/bin/idle
If you will be installing python packages — which is almost certainly the case — you’ll want the site-packages directory to be set up as a package-users installation directory.
root
) commands:tmpdir=$(mktemp -d)
echo /usr/lib/python3*/site-packages >> $tmpdir/sitedir
echo /usr/lib/python3*/site-packages/ '*' | tr -d ' ' >> $tmpdir/sitedir
cat /etc/pkgusr/install_dirs $tmpdir/sitedir > $tmpdir/instdirs
sort < $tmpdir/instdirs | uniq > /etc/pkgusr/install_dirs
rm -rf $tmpdir
Similarly, when python packages are installed, a reference to them is
recorded in a file called easy-install.pth
. To permit this, we can put
it into the install group and make it group-writable.
pushd /usr/lib/python3*/site-packages
touch easy-install.pth
chown python:install easy-install.pth
chmod g+w easy-install.pth
popd
17.3.23. eudev
Name |
Userspace Device-File Daemon |
---|---|
Version |
3.2.9 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
UNIX systems treat almost everything as a file, including I/O devices
like hard disks, optical drives, mouses, scanners, and so on. The files
that represent these devices are called "special" files (or sometimes
"device" files or simply "nodes"); instead of containing streams of data
(like normal files), or a structured set of names mapped to inodes (like
directories), special files are associated with device drivers in the
operating system kernel. Conventionally, these special files are found
in the top-level /dev
directory.
Historically, system administrators were expected to create special files corresponding to all the hardware devices available on a system. These days, that’s a somewhat unwieldy expectation — now that computer systems have USB and firewire and other hot-pluggable devices, the kinds of hardware devices that might be available on a system can change from minute to minute. Also, individual devices are defined by a major number that indicates which device driver is used to access that device and a minor number that indicates specifically which device should be accessed; the minor number is not always predictable and might be different each time the system is booted. So there’s no good way to pre-create special files for all the hardware devices that might ever be attached to a system.
Originally, the udev
program was used to manage these device files:
when hardware was detected, the kernel would notify udev
(or some
other handler program) by sending event notifications called "uevents"
over a netlink socket; whatever handler program was listening to that
socket would then create appropriate nodes so the hardware could be used
by userspace programs.
This is no longer the case, so udev
is not nearly as important as it
used to be! In modern Linux kernels, the kernel itself maintains a
temporary filesystem, the devtmpfs
, and automatically creates device
files within it as hardware is detected and removes them if hardware is
disconnected. But udev
is still helpful: it can set ownership and
access permissions as specified by system policy and encoded in
configuration files, and creates or removes symbolic links to device
files so they can be accessed using names or paths that might be more
convenient than the ones assigned by the kernel.
In 2012, the udev code was assimilated into systemd; since CBL doesn’t use systemd, and the systemd project team doesn’t support the use of udev without systemd, CBL can’t use udev either. Luckily, the Gentoo project also doesn’t use systemd, and some people at the Gentoo project forked udev into the new project "eudev" so that they would have a systemd-independent device management system.
CBL, like Gentoo, uses eudev to manage special files. (There are other
alternatives, as well: there are something like three separate programs,
all called mdev
, that similarly manage plug-and-play hardware.)
A test script for eudev is written in perl and has the path
/usr/bin/perl
hard-coded, so we might as well force the final system
perl to be built before eudev. As of the latest version of eudev, the
same is true of python — maybe replacing the perl script? I haven’t
checked.
./configure --prefix=/usr --sysconfdir=/etc --enable-split-usr \
--enable-hwdb
make
make check
make install
install -dv /lib/firmware
One of the policy decisions made by systemd — and imported as the
default policy of eudev
as well — is that network interfaces should
not use the default names defined by the kernel, like wlan0
or eth0
,
and should instead use names that are thought to be more stable and
predictable by the systemd developers. I don’t care for this behavior,
so I disable it by overriding the built-in "net-name-slot" rule using an
empty file.
touch /etc/udev/rules.d/80-net-name-slot.rules
The udevd
program should be run as a daemon process — it listens on
the netlink socket mentioned earlier for uevent messages from the kernel
and responds to them in accordance with system policy.
Service 1: Longrun |
|
---|---|
Dependencies |
|
Run script |
#!/bin/execline -P emptyenv export UDEV_LOG err fdmove -c 2 1 /usr/sbin/udevd --debug |
Service 2: Longrun |
|
Dependencies |
|
Run script |
#!/bin/execline -P emptyenv s6-log T s1000000 n10 /var/log/udevd |
root
) commands:mkdir -p /var/log/udevd
The eudev package also provides a program, udevadm
, that can be used
to query or manipulate the device management system. After setting up
the daemon, we can use udevadm
to ask the kernel to emit uevent
messages for all the hardware it has already found; that way, the local
system policy will be applied to everything that was pre-populated in
/dev
by the kernel.
Since this one-shot service handles hardware that was connected while the computer was turned off, I’m calling it the "coldplug" service — to differentiate it from the services that handle hot-plugged hardware devices.
Dependencies |
|
Up script |
if { s6-echo "Performing coldplugging" } if { udevadm trigger --action=change --type=subsystems } if { udevadm trigger --action=change --type=devices } if { udevadm settle } |
17.3.24. gdbm
Name |
GNU dbm |
---|---|
Version |
1.20 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Historical UNIX systems include a simple key/value database system
called dbm
(for "database manager"). GDBM is the GNU implementation of
dbm
, and supports the APIs of the original dbm
as well as Berkeley’s
extended ndbm
version, which supports having multiple databases open
at the same time.
./configure --prefix=/usr --enable-libgdbm-compat
make
By default, the dbm
-compatible APIs are not built. Some packages may
want to use them, though, so we override that option.
make check || echo "Exit code $?: continuing anyway"
With the current version of tcl or expect or something, one of the GDBM tests crashes.
make install
17.3.25. libgpg-error
Name |
GnuPG Runtime Library |
---|---|
Version |
1.42 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Originally, libgpg-error was just a library that defined common error values for the various components that make up GnuPG. It has turned into more of a runtime library, including a stream library, mutexes, a base64 decoder, and other miscellaneous logic used throughout the various pieces of GnuPG.
./configure --prefix=/usr
make
make check
make install
17.3.26. libgcrypt
Name |
GNU cryptographic library |
---|---|
Version |
1.9.3 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
Libgcrypt is a general purpose cryptographic library. It was originally part of GnuPG, but was factored out into a standalone project since it can be used by any program that needs cryptographic building blocks.
Libgcrypt can optionally make use of Linux capabilities — specifically, to lock memory pages so they cannot be written to swap. This is important for cryptographic libraries, because if a cryptographic key is written to persistent storage it might be recoverable later on.
Since CBL has support for capabilities, we can enable this.
./configure --prefix=/usr --with-capabilities
make
make check
make install
17.3.27. libksba
Name |
GNU Libksba library |
---|---|
Version |
1.6.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Libksba is a library of functions that simplify working with X.509 certificates, such as are used in TLS certificates.
./configure --prefix=/usr
make
make check
make install
17.3.28. libassuan
Name |
GNU Libassuan |
---|---|
Version |
2.5.5 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
Libassuan implements a protocol used for inter-process communication among many of the components of GnuPG. It was designed to prevent (potentially buggy) programs from compromising secret data; by moving the actual cryptographic work into a server program, and having the program that makes use of the result of that cryptographic work — such as an email client — be a separate client program, it’s much easier to demonstrate that there’s no way that the client program can leak secret key information.
./configure --prefix=/usr
make
make check
make install
17.3.29. npth
Name |
New GNU portable threads library |
---|---|
Version |
1.6 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Pth is a portable threads library; it provides cooperative priority-based scheduling for multiple threads within a program.
nPth is a new implementation of the same Pth API, based on the system’s standard threads implementation, that was written as part of the GNU Privacy Guard (gpg) project and is used extensively within modern versions of gpg (version 2.0 and higher).
./configure --prefix=/usr
make
make check
make install
17.3.30. pinentry
Name |
GNU PIN entry |
---|---|
Version |
1.1.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
This is a collection of password entry dialog programs, used by GnuPG to obtain passphrase information from users. The simplest of these, a simple TTY version, has no dependencies at all. More commonly, a Curses version (in CBL, using the ncurses library) or GUI version will be used instead.
Only the programs that work with libraries available at configuration time are built. You may want to rebuild and re-install this package if you install one of the GUI libraries for which a pinentry program is provided (GTK, GNOME, or Qt).
./configure --prefix=/usr
make
make check
make install
17.3.31. gnupg
Name |
The GNU Privacy Guard |
---|---|
Version |
2.3.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
|
Environment |
|
The GNU Privacy Guard is an implementation of the OpenPGP standard (defined in RFC4880). It is a general-purpose public key cryptography system, allowing you to encrypt files such that only a specific set of people can decrypt them, sign files such that anyone with your public key can ensure that only you could have signed them, and stuff like that.
In CBL, GnuPG is considered a core part of the system because it allows you to verify that the source code of many packages is exactly what the package maintainers want it to be. It’s very common for "signature" files to be distributed along with package source code files; with GnuPG, you can verify that the package source code has not been modified or corrupted by anyone.
- Environment variable: CFLAGS
-
$CFLAGS -fcommon
The behavior of GCC has changed in GCC 10 such that "common" sections are no longer used by default to permit duplicate object definitions. This causes build issues in GnuPG, so we override that change to get the old default behavior back.
A number of the tests in t-gettime.c
fail on the partial CBL system.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.32. groff
Name |
GNU troff |
---|---|
Version |
1.22.4 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
Groff is the GNU implementation of troff
, a component of a document
processing system originally developed as part of Unix. It’s fairly
flexible and powerful, and can be used to produce sophisticated
documents in a variety of output forms.
In CBL, groff is used primarily to generate man pages. For other sophisticated typesetting, I generally use TeX or LaTeX.
The latest version of groff has a few issues when built with the latest GCC, because of a conflicting declaration in a header file. That’s easy to fix with a patch.
-
groff-1.22.4-signbit-def-1.patch
When configuring groff, PAGE
should be set to the default paper size.
Typically this will be letter
or a4
.
PAGE=letter ./configure --prefix=/usr
make
There is no automated test suite for groff.
(none)
make install
17.3.33. gawk (final-system-components phase)
For an overview of gawk, see gawk.
./configure --prefix=/usr --sysconfdir=/etc
make
The test suite for gawk hangs forever, possibly because of some change
in glibc 2.27 (prior to which there was a test failure but the suite
completed without other incident) or possibly because of some issue in
the scaffolding environment. Inspecting the situation, the hanging
command is tr[A-Z][a-z]
; I have no idea what the problem is.
(none)
make install
Some of the documentation in the doc
directory doesn’t get installed
by the normal installation routine, but can be helpful if you are trying
to understand or use gawk.
mkdir /usr/share/doc/gawk
cp doc/{awkforai.txt,*.{eps,pdf,jpg}} /usr/share/doc/gawk
17.3.34. iana-etc
Name |
IANA /etc files |
---|---|
Version |
20210826a |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
The Internet Assigned Numbers Authority (IANA) mandates the use of standard numbers for network protocols and services. They’re the ones who say that HTTP should use TCP port 80, and HTTPS should use TCP port 443, for example.
Reference files with the number assignments are provided by IANA. These are distributed as XML files; what we need are subsets of the data found in the XML files in a specific format.
The package available on the FreeSA repository contains a script that
fetches the current version of those files; it also contains a version
of those files, updated reasonably often. It also has scripts, extracted
from the Arch Linux PKGBUILD file for their version of this package,
that produce the /etc/protocols
and /etc/services
files we need from
those canonical XML files.[11] This blueprint just runs those scripts.
(none)
./bin/gen-protocols protocol-numbers.xml protocols
./bin/gen-services service-names-port-numbers.xml services
(none)
cp -a -v protocols /etc
cp -a -v services /etc
mkdir -p /usr/share/iana-etc
cp -a -v *.xml /usr/share/iana-etc
17.3.35. iputils
Name |
IP Utilities |
---|---|
Version |
s20151218 |
Project URL |
|
SCM URL |
git://git.linux-ipv6.org/gitroot/iputils.git |
Download URL |
|
Dependencies |
This package contains a variety of small Internet Protocol utility
programs: ping
, tftpd
, tracepath
, things like that.
The (HTML and man-page) documentation for this package can only be built
if you have DocBook installed. That’s not currently part of CBL, so
the only thing we do here is copy the documentation sources to
/usr/share/doc/iputils
.
The ping6 utility uses libgcrypt’s MD5 hash function, and apparently can also use capabilities.
The IP Utilities package as a whole does not use the GNU build system,
so there is no configure stage for it. (The ninfod
component does, but
we’re going to skip that because it doesn’t build cleanly; it complains
about an inconsistent declaration of cap_setuid
. ninfod
is a daemon
that can respond to IPv6 node information queries; if that matters to
you, you should look into building and installing it.)
(none)
make
There’s no automated test suite, either.
(none)
Surprisingly, there’s also no installation routine for most of the iputils programs — you have to put stuff where you want it.
install -v -m755 arping /usr/bin
install -v -m755 clockdiff /usr/bin
install -v -m755 ping6 /bin
install -v -m755 ping /bin
install -v -m755 rarpd /usr/sbin
install -v -m755 rdisc /usr/bin
install -v -m755 tftpd /usr/sbin
install -v -m755 tracepath6 /usr/bin
install -v -m755 tracepath /usr/bin
install -v -m755 traceroute6 /usr/bin
You might want to adjust ping
and/or ping6
— if you want them to be
usable by non-root users, you need to make them setuid to root.
17.3.36. pkgconf (final-system-components phase)
For an overview of pkgconf, see pkgconf.
find . -exec touch -r README.md {} \;
./configure --prefix=/usr
make
pkgconf has automated tests implemented in the Kyua test framework. Unfortunately, the Kyua framework introduces a cycle: Kyua depends on pkg-config (as well as Lutok and SQLite). We could build and install pkgconf, the other dependencies, and then Kyua, and then re-build pkgconf so that we can run its tests. On the other hand: that’s a lot of work just to run pkgconf’s automated tests; pkgconf itself is a very simple program that is unlikely to fail dramatically; and if it does fail, the only effect of that failure is that some other program will be slightly harder to build.
(none)
make install
ln -sf pkgconf /usr/bin/pkg-config
17.3.37. xz (final-system-components phase)
For an overview of xz, see xz.
./configure --prefix=/usr
make
make check
make install
17.3.38. libtool
Name |
GNU libtool |
---|---|
Version |
2.4.6 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Libtool is a part of the GNU build system, along with autoconf and automake; you can read a bit more about it, and find links to resources to learn more, in the section about Autoconf. It lets you use a common set of commands to build and use static and shared libraries across a wide variety of operating systems.
Some of the libtool tests (all pertaining to the libltdl
library,
which provides a common API for dynamically opening shared libraries at
runtime) fail. As per usual, we presume that everything is really okay
and proceed; check the test logs manually if you’re concerned.
./configure --prefix=/usr
make
make check || echo "Exit code $?: continuing anyway"
make install
17.3.39. libxml2
Name |
GNOME XML library and toolkit |
---|---|
Version |
2.9.10 |
Project URL |
|
SCM URL |
git://git.gnome.org/libxml2 |
Download URL |
|
Patches |
|
Dependencies |
libxml2 is an XML (Extensible Markup Language) parser and toolkit that was originally written for the GNOME project.
-
libxml2-2.9.10-parenthesize-type-checks-1.patch
Python 3.9 exposes a bug in libxml2 2.9.10 (and earlier). I found a
description of the issue and a patch for it on the Fedora project’s
version of libxml2, at
https://src.fedoraproject.org/rpms/libxml2/pull-request/9
.
As distributed, the libxml2 source does not include a configure
script, so that has to be generated first.
autoreconf -i
./configure --prefix=/usr
make
The test suite currently requires a separate package. According to the make
check
target, instructions for setting this up can be found at
http://www.w3.org/XML/Test/ if you are interested in doing this; I’m not
interested enough in running these tests to look into it.
(none)
make install
XML always canonically identifies stylesheets and schemas and things
like that using URIs, so it’s common for XML documents to include
references to stylesheets at web locations like
http://docbook.sourceforge.net/release/xsl-ns/current/manpages/docbook.xsl
.
Programs that handle XML documents, like the programs provided in this
package and libxslt, will automaticaly fetch files from those
canonical locations whenever they find such references. Of course, it’s
not necessary to fetch files from the Internet if they are already
present locally; with this in mind, these programs will also make use of
an XML "catalog" file — conventionally located at /etc/xml/catalog
— if there is one. XML catalog files can contain entries that map Internet
locations to local filesystem paths, among other things, so unnecessary
Internet lookups can be eliminated.
One of the programs provided by libxml2 is xmlcatalog
, which can be
used to administer XML catalog files. We’ll use it to create a catalog
file at the conventional location.
mkdir -p /etc/xml
if [ ! -f /etc/xml/catalog ]; \
then \
xmlcatalog --noout --create /etc/xml/catalog; \
fi
Since other packages will need to update the catalog file to add
references to artifacts they’ve installed, we make the xmlcatalog
program setuid, and put it in the install group — that way, any package
user will be able to run xmlcatalog
to update the main XML catalog
file.
root
) commands:chgrp install /usr/bin/xmlcatalog
chmod 4750 /usr/bin/xmlcatalog
The catalog file is a configuration file, of a sort, so it’s nice to track changes to it over time.
-
/etc/xml/catalog
17.3.40. libxslt
Name |
GNOME XSLT library |
---|---|
Version |
1.1.34 |
Project URL |
|
SCM URL |
git://git.gnome.org/libxslt |
Download URL |
|
Dependencies |
libxslt is an XSLT (Extensible Stylesheet Language Transformations)
library that was originally written for the GNOME project. As a library,
it provides functionality that lets programs transform XML documents
into other things using XSL stylesheets; it also includes a program
xsltproc
that lets you use that functionality from the command line.
As with libxml2, the source distribution for libxslt does not have a
configure
script, so it must be generated.
autoreconf -i
./configure --prefix=/usr
make
make check
make install
17.3.41. docbook-xsl-nons
Name |
DocBook XSL Stylesheets no-namespace |
---|---|
Version |
1.79.2 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
The primary blueprint for this package is docbook-xsl. This is an alternate version of the DocBook XSL stylesheets for use with DocBook versions prior to 5, when the DocBook namespace prefix was added.
As with the namespaced version, since we’re not building the stylesheets, there’s no configuration, compilation, or testing steps.
(none)
(none)
(none)
The installation for this package is much like the namespaced version; we’re simply copying to a different location, and adding a different set of catalog entries.
mkdir -p /usr/share/xml/docbook-xsl-nons-${version}
cp -v -R * /usr/share/xml/docbook-xsl-nons-${version}
xmlcatalog --noout --add "rewriteSystem" \
"http://docbook.sourceforge.net/release/xsl/${version}" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://docbook.sourceforge.net/release/xsl/${version}" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteSystem" \
"http://docbook.sourceforge.net/release/xsl/current" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://docbook.sourceforge.net/release/xsl/current" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteSystem" \
"http://cdn.docbook.org/release/xsl-nons/${version}" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://cdn.docbook.org/release/xsl-nons/${version}" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteSystem" \
"http://cdn.docbook.org/release/xsl-nons/current" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://cdn.docbook.org/release/xsl-nons/current" \
"/usr/share/xml/docbook-xsl-nons-${version}" /etc/xml/catalog
17.3.42. docbook4-xml-dtd
Name |
DocBook 4 XML DTD |
---|---|
Version |
4.5 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
DocBook is an XML markup language for technical documentation. It lets you create document content in a presentation-neutral form; you can then use programs to transform that content into a variety of formats. It’s formally defined by a "schema".
There are a bunch of ways that an XML schema can be presented; one of those, "RELAX NG" (a quasi-acronym for "REgular LAnguage for XML, Next Generation"), is the primary version for modern versions of DocBook. Others are also provided by the OASIS Technical Committee that maintains DocBook, including "W3C XML Schema" and "XML DTD" (which stands for "Document Type Definition).
The current version of DocBook is 5.1, and you can read all about it at
http://docbook.org/
— including, as of May 2017, an ebook that can be
downloaded from http://tdg.docbook.org/
. However, CBL requires a
previous version of DocBook for building the documentation for the kmod
package. This blueprint is specifically for version 4.5 of the XML DTD
variant of DocBook; kmod’s documentation is written using the
even-more-obsolete version 4.2, but the point-releases are compatible.
The distribution package for this package is a zip file,
docbook-xml-4.5.zip
— if you don’t use the package from the CBL
repository, you’ll need to convert it to the standard packaging
structure used by litbuild.
As with other non-program XML artifacts, the DocBook XML DTD consists of files that should be copied to a location on the filesystem, with that location referenced by the main XML catalog file. There’s no configuration, build, or test stage for this package.
(none)
(none)
(none)
mkdir -p /usr/share/xml/docbook-xml-dtd-4.5
cp -v -af docbook.cat *.dtd *.mod ent \
/usr/share/xml/docbook-xml-dtd-4.5
The way we set up the XML catalog for the XML DTD is: we create a
separate docbook
catalog file that references the files from this
package, and then we add entries to the main catalog
file to delegate
to the docbook
catalog file.
It’s possible that other packages will have documentation that specifies
it needs some other 4.x version of the DocBook XML DTD (4.1, 4.2, 4.3,
or 4.4). This version of the DTD will work for any of those packages; if
necessary, you can add additional entries to the docbook
and main XML
catalog
files, specifying that the files for those versions are in the
directory where the 4.5 DTD files reside, or you can patch the client
package so that the documentation requests the 4.5 DTD.
if [ ! -e /etc/xml/docbook ]; \
then \
xmlcatalog --noout --create /etc/xml/docbook; \
fi
xmlcatalog --noout --add "public" \
"-//OASIS//DTD DocBook XML V4.5//EN" \
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//DTD DocBook XML CALS Table Model V4.5//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/calstblx.dtd" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//DTD XML Exchange Table Model 19990315//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/soextblx.dtd" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//ELEMENTS DocBook XML Information Pool V4.5//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/dbpoolx.mod" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//ELEMENTS DocBook XML Document Hierarchy V4.5//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/dbhierx.mod" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//ELEMENTS DocBook XML HTML Tables V4.5//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/htmltblx.mod" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//ENTITIES DocBook XML Notations V4.5//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/dbnotnx.mod" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//ENTITIES DocBook XML Character Entities V4.5//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/dbcentx.mod" /etc/xml/docbook
xmlcatalog --noout --add "public" \
"-//OASIS//ENTITIES DocBook XML Additional General Entities V4.5//EN" \
"file:///usr/share/xml/docbook-xml-dtd-4.5/dbgenent.mod" /etc/xml/docbook
xmlcatalog --noout --add "rewriteSystem" \
"http://www.oasis-open.org/docbook/xml/4.5" \
"file:///usr/share/xml/docbook-xml-dtd-4.5" /etc/xml/docbook
xmlcatalog --noout --add "rewriteURI" \
"http://www.oasis-open.org/docbook/xml/4.5" \
"file:///usr/share/xml/docbook-xml-dtd-4.5" /etc/xml/docbook
xmlcatalog --noout --add "delegatePublic" "-//OASIS//ENTITIES DocBook XML" \
"file:///etc/xml/docbook" /etc/xml/catalog
xmlcatalog --noout --add "delegatePublic" "-//OASIS//DTD DocBook XML" \
"file:///etc/xml/docbook" /etc/xml/catalog
xmlcatalog --noout --add "delegateSystem" \
"http://www.oasis-open.org/docbook/" \
"file:///etc/xml/docbook" /etc/xml/catalog
xmlcatalog --noout --add "delegateURI" \
"http://www.oasis-open.org/docbook/" \
"file:///etc/xml/docbook" /etc/xml/catalog
-
/etc/xml/docbook
17.3.43. kmod
Name |
kmod |
---|---|
Version |
29 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Dependencies |
pkgconf, xz, zlib, libxslt, docbook-xsl-nons, docbook4-xml-dtd |
Linux is a monolithic kernel, but that doesn’t mean that the whole kernel is all in one big file. Almost all parts of the kernel can be built into small modules that can then be loaded into the kernel at runtime. If you know you’re always going to want some kernel feature, you can build it into the basic kernel image, but if you might not ever wind up using a kernel feature, or expect to need it only occasionally, you can build it as a module instead, and only load it if you need it for something.
This is handy if you’ve got some hardware devices that you don’t always use — a USB scanner or camera, for example — and you don’t want to waste memory on the device drivers for that hardware unless you actually plug it in.
It’s also handy for binary GNU/Linux distributions, where the expectation is that users of the system won’t be building their own kernels — those distributions can build almost everything as modules, and then use an "early userspace" facility to figure out exactly what modules need to be loaded to get the system working. That complicates the system significantly: early userspaces in Linux depend on an initial ramdisk or ramfs that is loaded by the boot loader along with the Linux kernel, and it has to contain enough initalization logic to figure out what modules need to be loaded, and actually load them, and then mount the actual root filesystem and pivot_root into it and exec the real init process… really, there’s a lot going on there! In CBL, we don’t need any of that complexity because we always build a kernel from source; we just need to make sure that all the kernel features necessary to get the system running are compiled directly into the kernel image.
Still, pretty much every kernel configuration includes at least some
modules, so we need to be able to load those modules once we have the
system booted. That’s what the kmod
package provides: userspace
programs that manipulate kernel modules.
- Dependencies
The documentation for kmod is written in DocBook XML, using the (very obsolete) DocBook 4.2 release. It’s not difficult to provide versions of the XML DTD and XSL stylesheets that kmod currently expects, so that’s what we’re doing here.
At configuration time, kmod has to be told to enable support for compressed kernel modules, so we do that.
./autogen.sh
./configure --prefix=/usr --sysconfdir=/etc --with-zlib --with-xz
make
The test suite for kmod assumes that a kernel and modules have already
been built and installed in the conventional location, which is not the
case for CBL — in particular, it expects a build
script to be present
at /lib/modules/$(uname -r)/build
, but that’s a symbolic link to the
original host-system path where the kernel source existed. So at this
point it’s necessary to defer the test suite until later, or skip it
entirely.
(none)
make install
Starting with Linux 4.18, the kernel’s module installation routine
insists on looking for kmod
programs in the /sbin
directory, rather
than /usr/bin
. Let’s ensure it is visible there.
Note that if /usr
is a different filesystem than the root filesystem,
this command will fail — you will need to make it a symbolic link by
adding -s
to the ln
command arguments.
ln -f /usr/bin/kmod /sbin/kmod
Kmod is designed to be a multi-call binary — there’s only one actual program, and its behavior varies depending on what name it’s invoked with: for example, when you run it as insmod, it installs a module; when you run it as rmmod, it removes it. For some reason, though, the installation process for kmod does not install links with the expected names, so we have to do that ourselves.
for program in depmod insmod lsmod modinfo modprobe rmmod; \
do \
ln -f /usr/bin/kmod /usr/bin/$program; \
ln -f /usr/bin/kmod /sbin/$program; \
done
17.3.44. less
Name |
Less file viewer |
---|---|
Version |
590 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
Less is a text file viewer. It’s often used as a pager at the end of a
pipeline — if you run a command that generates thousands of lines of
output, you can pipe that output into less
and it will let you
navigate around the output in a variety of helpful ways.
./configure --prefix=/usr --sysconfdir=/etc
make
There is no automated test suite for less.
(none)
make install
17.3.45. libpipeline
Name |
Subprocess pipeline library |
---|---|
Version |
1.5.3 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
According to its homepage: "libpipeline is a C library for manipulating pipelines of subprocesses in a flexible and convenient way."
The maintainer of the man-db project discovered that using the basic C
library functions like system
and popen
to run a collection of
subprocesses and link them together into a pipeline (that is, the
standard output of each program is fed into the standard input in the
next program in the chain) was prone to issues, so he wrote libpipeline
to encapsulate all of that stuff.
./configure --prefix=/usr
make
make check
make install
17.3.46. litbuild
Name |
Literate Build tool |
---|---|
Version |
1.0.8 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
Litbuild is the program used to transform the blueprints from Cross-Building Linux into executable scripts and documentation that is intended to be be read.
Like most other ruby programs, litbuild can be installed as a "gem,"
rather than installed from a source tar file. Here, we are going to
install it as a gem using the gem
command, but we’re starting from a
source distribution and constructing the gem itself from that source
distribution first.
(none)
The rubygems program — which is provided as part of the ruby language
distribution — provides a facility to construct a gem from a source
package, using a gemspec
file that tells it what should be packaged
that way.
gem build litbuild.gemspec
There are lots of automated tests for litbuild; unfortunately, they can only be run if a bunch of additional ruby packages are installed, and I don’t have energy or enthusiasm enough to write blueprints for them at the moment.
echo "Skipping tests (would be `rake spec`)"
gem install -l litbuild*gem
17.3.47. lsof
Name |
lsof |
---|---|
Version |
4.94.0 |
Project URL |
|
SCM URL |
|
Download URL |
lsof
lists files that are currently opened by Unix processes. (The
name stands for "LiSt Open Files".)
lsof does not use the GNU build system, and by default the configuration process runs interactively and prompts for input; this blueprint has instructions for getting the same end result without any interaction.
The distribution includes an Inventory
script that checks to make sure
everything is present. We might as well run it.
echo y | ./Inventory
The Configure
script is run with a "dialect" name, which selects from
among a set of known operating system or platform types and sets up
header files appropriately. It normally runs a Customize
script (as
well as Inventory
, if that hasn’t been run yet) to allow the default
values in the header files for the dialect to be customized, but we’ll
do that ourselves.
./Configure -n linux
Now we can adjust the default settings for the program. By default, the
HASSECURITY
flag is disabled, so any user can examine all open files,
regardless of who owns the process that has the files open. This is very
insecure. That’s the only option I want to enable.
chmod 664 dialects/linux/machine.h
echo '#define HASSECURITY 1' >> machine.h
echo '#undef HASNOSOCKSECURITY' >> machine.h
echo '#undef WARNINGSTATE' >> machine.h
echo '#undef HASDCACHE' >> machine.h
echo '#undef HASENVDC' >> machine.h
echo '#undef HASPERSDC' >> machine.h
echo '#undef HASPERSDCPATH' >> machine.h
echo '#undef HASSYSDC' >> machine.h
echo '#undef HASKERNIDCK' >> machine.h
make
There are no automated tests for lsof.
(none)
The Makefile for lsof suggests writing your own install
rule, rather
than having an installation routine. When HASSECURITY
is set, there’s
no point in installing lsof setuid to root, so we can just copy the
program and manual page to standard locations.
cp lsof /usr/sbin
cp Lsof.8 /usr/share/man/man8/lsof.8
17.3.48. man-db
Name |
Manual page viewer |
---|---|
Version |
2.9.4 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
man-db is a manual page viewer. It’s used (via the man
command) to
read the standard Unix documentation found in manual pages. Unlike the
original implementations of man
, this package uses a Berkeley DB
database to speed up the process of finding man pages.
In GNU systems like CBL, the canonical documentation for many programs
is in info
files rather than manual pages, but the manual pages are
still a crticial part of the system documentation.
The installation routine for man-db tries to install the man
program
setuid to the user "man". This conflicts with the package-user scheme,
so we disable it.
man-db can optionally make use of a large number of external programs
for various purposes — web browsers, program source preprocessors,
graph preprocessors, and others; run ./configure --help
to see all the
--with
options. If you want man-db to have support for those built in,
add the appropriate configure switches. (And, eventually, you’ll have to
install those programs if they’re not part of the CBL build.)
Also by default, man-db sets up a tmpfiles directory entry for use by systemd, which uses the files in that directory to determine what temporary files to clean up, or something. (Because, of course, any good init system will be in charge of managing temporary files.) CBL does not use systemd, so we skip it.
./configure --prefix=/usr --sysconfdir=/etc --disable-setuid \
--without-systemdtmpfilesdir
make
A number of the automated tests fail. As usual, we proceed with the build and suggest an eventual inspection of the log files and system state.
make check || echo "Exit code $?: continuing anyway"
TODO get rid of man-db.service and man-db.timer as well, either pre-install or post-install
make install
17.3.49. man-pages
Name |
Linux Manual Pages collection |
---|---|
Version |
5.12 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
Historically, UNIX systems have been documented by online "manual
pages," which can be displayed using the man
program. (You can find
out a lot more about the manual pages by running man man
, once the
man
program is installed.)
Many open-source programs are distributed along with man pages that describe them as part of the source package. However, there is also a distribution of manual pages for Linux — this package — that contains documentation for system calls, library routines, file formats, and things like that. It also contains man pages for programs that are commonly part of GNU/Linux systems but are not distributed with their own man pages as part of the source packages.
(none)
(none)
(none)
Some man pages included in this distribution are also now installed as part of other packages. We can remove the conflicting pages from this package to avoid any problems.
rm -f man5/tzfile.5
rm -f man8/{tzselect,zdump,zic}.8
make install
17.3.50. openssh
Name |
OpenSSH |
---|---|
Version |
8.6p1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
https://cloudflare.cdn.openbsd.org/pub/OpenBSD/OpenSSH/portable/ |
Dependencies |
OpenSSH provides the ability to access remote systems in a secure way, with all traffic between your computer and the remote one encrypted. It’s frightfully useful!
We build OpenSSH using LibreSSL as the underlying cryptographic engine.
The OpenSSH daemon runs in a "privilege separation" mode, to make it difficult to exploit any security-critical bugs that may exist in the code. The OpenSSH daemon runs with system privileges, but when a client opens a connection to that daemon, it starts by spawning a process running as a different user with practically no privileges. That unprivileged process then handles all communications over the network to the (potentially malicious) client, making calls back to the privileged parent process using a well-defined interface; if the unprivileged child process is compromised, the damage it can do is strictly limited.
To enable this, we need to add an sshd
user and group configured the
way that OpenSSH expects it to be. This user and group define the
unprivileged security context.
root
) commands:mkdir -p /var/empty
chown root:sys /var/empty
chmod 755 /var/empty
grep -q ^sshd: /etc/group || groupadd -r sshd
grep -q ^sshd: /etc/passwd || useradd -r -g sshd \
-c 'sshd privilege separation' -d /var/empty -s /bin/false sshd
./configure --prefix=/usr \
--with-ssl-dir=/etc/ssl --sysconfdir=/etc/ssh
make
Some of the OpenSSH tests apparently assume that the computer is on the network, or otherwise object to the limited CBL userspace.
make tests || echo "Exit code $?: continuing anyway"
The installation process for OpenSSH generates a set of unique host keys
that will identify the server to clients. It also copies a file called
moduli
into the /etc/ssh
directory; that file contains a bunch of
probably-prime numbers used during the key exchange part of the SSH
handshake. You can use that moduli
file without worrying about it — most people do — but if you want to produce one of your own, you can do
so; look at the generate-ssh-moduli
blueprint.
If networking is enabled, the sshd
daemon should be run. Note that the
run
script for the service uses an absolute path for the sshd
program — this is required, starting with OpenSSH 3.9, because sshd
re-executes itself every time a connection is made. If a non-absolute
path is used, sshd
will complain and won’t start.
Service 1: Longrun |
|
---|---|
Dependencies |
|
Run script |
#!/bin/execline -P emptyenv fdmove -c 2 1 foreground { ssh-keygen -A } /usr/sbin/sshd -D -e |
Service 2: Longrun |
|
Dependencies |
|
Run script |
#!/bin/execline -P emptyenv s6-log T s10000000 n10 /var/log/sshd |
root
) commands:mkdir -p /var/log/sshd
chown root:root /var/log/sshd
chmod 0755 /var/log/sshd
In the servicedir specified above, you might notice a call to
ssh-keygen
. That ensures that the host keys needed by the OpenSSH
daemon (to securely and uniquely identify the server to clients) are
present. These are generated during the installation process, so the
only time that you’ll need to generate new host keys is if they’ve been
deleted for some reason, but that can be the case for CBL systems
launched in Cloud Computing providers like Amazon Web Services.
-
/etc/ssh/ssh_config
-
/etc/ssh/sshd_config
-
/etc/s6-rc/source/sshd-svc
-
/etc/s6-rc/source/sshd-log
make install
17.3.51. procps
Name |
proc filesystem utilities |
---|---|
Version |
3.3.17 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
GNU/Linux systems have a pseudo-filesystem called proc
, conventionally
mounted at /proc
. This is called a "pseudo-filesystem" because it’s
not actually used to store files; rather, it’s a view into the state of
the computer system exposed by the Linux kernel as a filesystem. The
proc filesystem contains a subdirectory for each process currently
running on the system, as well as a bunch of files and subdirectories
that give a view into other parts of the running system. (For example,
if you want to know what CPUs are in the system, you can cat
/proc/cpuinfo
.)
Using the proc filesystem directly is not always the most convenient way
to find out what’s going on in the system. The procps package contains a
bunch of programs that present information from /proc
in a more
helpful and convenient way.
procps provides a version of the kill
program, which is also provided
by util-linux; we’re going to prefer the util-linux version. We also
specify that watch
should be built "8 bit clean", which requires
ncurses with wide character support. The watch source expects the
ncurses.h
header file to be in a different location when this is
enabled, so we have to adjust that expectation.
The procps configuration script also provides an option to select systemd support, so of course we disable it.
./autogen.sh
./configure --prefix=/usr --disable-kill --enable-watch8bit \
--without-systemd
sed -i 's@<ncursesw/ncurses.h>@<ncurses.h>@' watch.c
make
The automated tests fail at this point with an error about insufficient ptys. Just skip them for now.
(none)
make install
17.3.52. psmisc
Name |
Miscellaneous process utilities |
---|---|
Version |
23.4 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
The PSmisc package is a collection of small useful utilities that use the proc filesystem to inspect and modify processes:
-
fuser
shows processes that are using files (kind of likelsof
); -
killall
lets you send signals to multiple programs based on their name (kind of likepkill
from procps); -
prtstat
shows process statistics; -
pslog
shows the paths to log files that a process has open; -
pstree
shows process information likeps
, but showing the process hierarchy explicitly; -
and
peekfd
lets you find out about file descriptors being used by a process.
./autogen.sh
./configure --prefix=/usr
make
The package does not have an automated test suite.
(none)
make install
17.3.53. sysfsutils
Name |
System Utilities Based on Sysfs |
---|---|
Version |
2.1.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
https://sourceforge.net/projects/linux-diag/files/sysfsutils/ |
Patches |
|
This package contains a library, libsysfs
, that provides an interface
for querying the information about system devices exposed through the
sysfs
filesystem (conventionally mounted at /sys
); and a program
called systool
that provides a command-line interface to the
functionality in libsysfs
.
libsysfs
is important primarily because it’s a dependency for the
rng-tools
.
-
sysfsutils-2.1.0-update-config-guess-1.patch
On some servers, the version of config.guess
distributed with this
package is too old to recognize the target triplet. It’s easy enough to
update it to the version distributed with the modern GCC sources.
./configure --prefix=/usr --mandir=/usr/share/man
make
make check
make install
17.3.54. curl
Name |
cURL |
---|---|
Version |
7.78.0 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
17.3.54.1. Overview
Curl is a program to transfer data to or from a server using any of a
variety of protocols — mostly, ftp, sftp, http, and https. It’s a lot
like GNU wget; one important difference is that curl also provides a
library, libcurl
, which can provide similar functionality to other
programs that need it.
17.3.54.2. curl (final-system-components phase)
./configure --prefix=/usr --enable-optimize \
--with-ca-bundle=/etc/ssl/cert.pem --with-openssl
make
Don’t be misled by the --with-openssl
configure directive. It works
for BoringSSL and LibreSSL as well as OpenSSL.
(none)
The automated tests for curl don’t work reliably when there is no network available; some tests hang interminably.
make install
17.3.55. jansson
Name |
Jansson |
---|---|
Version |
2.13.1 |
Project URL |
|
SCM URL |
|
Download URL |
Jansson is a C library for manipulating JSON (JavaScript Object Notation) strings. It’s used by rng-tools.
./configure --prefix=/usr
make
make check
make install
17.3.56. rng-tools
Name |
RNG tools |
---|---|
Version |
6.14 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
|
Environment |
|
The Linux kernel provides access to random data drawn from an internal entropy pool, populated by random events like the timing of key presses and network packets being received from other systems.
rngd, a daemon program provided in this package, supplements this
collection of random data: it monitors other entropy sources — like
hardware random number generators found at /dev/hwrng
, the Trusted
Platform Module random number generator found at /dev/tpm0
, and the
RDRAND CPU instruction — and feeds random data from those sources into
the entropy pool. This is pretty much always a good idea, if there are
any such sources.
- Environment variable: openssl_LIBS
-
$LDFLAGS -lcrypto
The distribution tarfile does not include the configure script, so we
need to create that with the autogen.sh
script. We also configure
without support for PKCS#11, because that adds an additional dependency
on the libp11 package. Similarly, the "rtlsdr" feature introduces an
unnecessary dependency, so we disable that as well.
./autogen.sh
./configure --prefix=/usr --without-pkcs11 --without-rtlsdr
make
A test sometimes fails in the minimal userspace.
make -k check || echo "Exit code $?: continuing anyway"
Service 1: Longrun |
|
---|---|
Dependencies |
|
Run script |
#!/bin/execline -P emptyenv fdmove 2 1 /usr/sbin/rngd -f -d |
Service 2: Longrun |
|
Dependencies |
|
Run script |
#!/bin/execline -P emptyenv s6-log T s10000000 n10 /var/log/rngd |
root
) commands:mkdir -p /var/log/rngd
chown root:root /var/log/rngd
chmod 0755 /var/log/rngd
make install
17.3.57. sudo
Name |
su do |
---|---|
Version |
1.9.4p2 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
Sudo allows users to run commands as root or as another user. Unlike the
su
program provided in the shadow package, sudo has a focus on
configurability and auditability — users or groups can be permitted to
run specific commands or any commands, with password required or not,
and any command executed via sudo can be logged.
The installation process for sudo always tries to set the UID and GID
of files it installs to specific values; by default these are both
0, so the sudo
program can wind up being setuid to root. This
conflicts with the package-users approach. We can get around that easily
enough by overridding the UID and GID specified in the top-level
Makefile
. We look those up using the id
program provided by
the GNU coreutils, which makes that a build-time dependency.
- Dependencies
sed -i -e "s@^install_uid =.*@install_uid = $(id -u)@" \
-e "s@^install_gid =.*@install_gid = $(id -g)@" Makefile.in
Sudo can write log messages using the syslog facility or to a log file. We actually want its log messages to be written to an s6-log directory, like other system logs, so we’ll write them to a named pipe created at boot time, and set up a supervised s6-log service to grab everything written to that log file.
./configure --prefix=/usr --enable-zlib --disable-rpath \
--with-logging=file --with-env-editor --with-secure-path \
--with-logpath=/run/fifos/sudo
make
The main configuration file for sudo is /etc/sudoers
; the installation
process crashes if there is no file already present at that path, so we
create an empty one.
The default sudoers
file content is installed as /etc/sudoers.dist
.
In CBL we generally want to start with the default configuration and
then make changes based on policy decisions, so we just copy
sudoers.dist
over the empty file we initially created.
make check
touch /etc/sudoers
make install
mv -f /etc/sudoers.dist /etc/sudoers
The installation process also insists on having a "run directory," by
default /run/sudo
, and tries to create it if it does not exist. This
fails when using the package users scheme because /run
is not an
install directory. We can pre-create the directory as root to work
around the issue.
root
) commands:mkdir -p /run/sudo
chown sudo:sudo /run/sudo
Since /run
is a temporary filesystem that does not persist across
reboots, that directory will vanish when the system is shut down. But
sudo
will create that directory every time it is run, if it doesn’t
exist, so we don’t have to worry about making sure it is re-created as
part of system startup.
We do need the sudo
program to be setuid to root, and it’s necessary
(or at least desirable) for some other files and directories to be owned
by UID 0 as well.
root
) commands:chown root /usr/bin/sudo
chmod u+s /usr/bin/sudo
chown root /usr/libexec/sudo/sudoers.so
chown root /etc/sudoers
chown root /etc/sudoers.d
chmod 755 /etc/sudoers.d
chown -R root /var/db/sudo
chown root:root /etc/sudo.conf
-
/etc/sudoers
-
/etc/sudoers.d
-
/etc/sudo.conf
-
/etc/s6-rc/source/create-sudo-fifo
-
/etc/s6-rc/source/sudo-log
The sudo program generally logs via syslog, but can be told to write its logs to a file instead. We don’t actually want to write to a file, because we want to direct all those messages to an s6-log service; to facilitate this, we can write them to a FIFO in a canonical location, and have the s6-log process read its standard input from that FIFO.
There’s a minor infelicity with this approach: since each invocation of
sudo
is a different process and will close the log file when it
terminates, the s6-log process will keep finding its standard input has
been closed and will shut itself down each time. The service supervisor
will therefore need to spawn a new s6-log every time someone uses a sudo
command. That’s a little unfortunate but not a big deal — essentially,
one extra process will be spawned every time sudo
is used to run a
command.
We always have a tmpfs mounted at /run
, so that’s a convenient place
to put the sudo log FIFO (along with FIFOs for any other command that
wants to write log messages to a file location).
Dependencies |
|
Up script |
if { s6-echo "Creating FIFO for sudo logs" } s6-mkfifo -m 0600 /run/fifos/sudo |
Now we need an s6-log process that reads from the FIFO and writes to a log directory.
Dependencies |
|
Run script |
#!/bin/execline -P redirfd -r -nb 0 /run/fifos/sudo emptyenv s6-log T s1000000 n10 /var/log/sudo |
root
) commands:mkdir -p /var/log/sudo
Starting in version 1.8.29, sudo expects there to be a file
/etc/environment
that can contain environment variable settings. This
file is canonically set up by the Pluggable Authentication Module (PAM)
package, and used by the pam_env
module within that package, but since
PAM is not built as part of CBL the file does not automatically exist.
It’s irritating to see a warning message every time you run sudo
, so
we’ll ensure there is a file at that location.
root
) commands:touch /etc/environment
17.3.58. xml-parser
Name |
Perl XML-Parser module |
---|---|
Version |
2.46 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
(unknown) |
Dependencies |
The XML::Parser package is a module that provides a perl interface to
the expat
XML parser.
This uses the standard perl module build process, so we have to override the default configure and test commands.
perl Makefile.PL
make
make test
make install
Once the CBL system is complete, litbuild and the CBL blueprints can still be used to install new versions of all the packages that are already part of the system. They can also be used to install additional packages — many of the blueprints in the CBL repository are for packages that aren’t part of the base system at all, and will only be used if you explicitly ask litbuild to produce scripts or build documentation for them (or for something that depends on them).
17.3.59. bash (final-system-components phase)
For an overview of bash, see bash.
Right after we install this package, we will delete the symbolic links
we created previously and replace them with links to the final system
bash
. In order to be able to do that, we need to change the links to
be owned by the new "bash" package user. The -h
option for chown
causes the ownership of the symbolic link per se to be changed, rather
than the file it references.
root
) commands:chown -h bash /bin/bash /bin/sh
We again disable the bash malloc
routine. We also tell bash to use the
readline library we’ve already installed.
./configure --prefix=/usr --without-bash-malloc --with-installed-readline
make
make tests
The install target for bash fails unless /bin/sh
is present, so we
don’t actually remove the temporary symlink until after the new version
of bash is installed. Then we move stuff around so it’s all where we
want it to be.
make install
rm -f /bin/bash /bin/sh
mv /usr/bin/bash /bin
ln -s bash /bin/sh
ln -s /bin/bash /usr/bin/bash
17.3.60. ed
Name |
GNU standard text editor |
---|---|
Version |
1.17 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
17.3.60.1. Overview
Ed is the standard text editor.
17.3.60.2. ed (final-system-components phase)
./configure --prefix=/usr
make
make check
make install
17.3.61. bc
Name |
GNU basic calculator |
---|---|
Version |
1.07.1 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
17.3.61.1. Overview
bc
(which, according to Wikipedia, is an acronym for "basic
calculator") is an arbitrary-precision calculator — the documentation
calls it a "numeric processing language," which is probably accurate,
but mostly I use it as a calculator program.
"Arbitrary-precision" means that you’re not limited to standard 32- or 64-bit integer math, or IEEE standard types of floating-point arithmetic, or anything like that. You can ask it "Hey, what’s six to the sixth power, to the sixth power?" and it will obligingly give you a couple screens full of numbers.
17.3.61.2. bc (final-system-components phase)
- Dependencies
./configure --prefix=/usr --with-readline
make
There is a nonstandard way to run the automated test suite for bc. You have to inspect the results manually to see if anything is wrong, though: it always exits with a 0 status code.
echo 'quit' | ./bc/bc -l Test/checklib.b
make install
17.3.62. tcl (final-system-components phase)
For an overview of tcl, see tcl.
Build Directory |
|
---|
We want to run automated test suites once the CBL system is complete, obviously, so we will need tcl on the final system just as we do during the initial build.
- Build Directory
-
unix
./configure --prefix=/usr --mandir=/usr/share/man \
--infodir=/usr/share/info
make
(none)
make install
make install-private-headers
17.3.63. expect (final-system-components phase)
For an overview of expect, see expect.
./configure --prefix=/usr --with-tcl=/usr/lib \
--with-tclinclude=/usr/include
make
(none)
make install
17.3.64. dejagnu (final-system-components phase)
For an overview of dejagnu, see dejagnu.
./configure --prefix=/usr
(none)
(none)
make install
17.3.65. diffutils (final-system-components phase)
For an overview of diffutils, see diffutils.
By default, diffutils uses ed
as the default text editor for sdiff
.
On CBL systems, we prefer vim
as the default editor; feel free to
adjust this to taste.
./configure --prefix=/usr
sed -i 's@\(^#define DEFAULT_EDITOR_PROGRAM \).*@\1"vim"@' lib/config.h
make
The test-strftime
test fails because two of the expected UTC date
strings are offset by a few seconds. As always, inspect the test logs
and determine whether the results are satisfactory.
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.66. util-linux (final-system-components phase)
For an overview of util-linux, see util-linux.
This is a fairly normal package-user build. If python is available when
it is installed, a python libmount
library gets installed, so it’s a
good idea to make sure python is installed first.
- Dependencies
./configure --enable-write
make
Many of the util-linux tests will be skipped any time the suite is run by a non-privileged user. If you want to run the most thorough test suite, run them as root.
One of the tests fails because the openpty
function does not work
properly in the partial-system environment. In fact, it fails in such a
way that putting in a guard clause doesn’t work; the build still aborts
entirely. As with other packages whose test suite is problematic in the
partial environment, we skip the tests entirely for now.
(none)
make install
17.3.67. e2fsprogs (final-system-components phase)
For an overview of e2fsprogs, see e2fsprogs.
Build Directory |
|
---|
This is configured similarly to the scaffolding version.
../configure --enable-elf-shlibs \
--disable-libblkid --disable-libuuid --disable-fsck --disable-uuidd \
--without-crond-dir
make
At one test (f_pre_1970_date_encoding
) sometimes fails for me.
make -k check || echo "Exit code $?: continuing anyway"
make install
make install-libs
17.3.68. file (final-system-components phase)
For an overview of file, see file.
./configure --prefix=/usr
make
make check
make install
17.3.69. findutils (final-system-components phase)
For an overview of findutils, see findutils.
./configure --prefix=/usr --localstatedir=/var/lib/locate
make
The test-strftime
test fails because two of the expected UTC date
strings are offset by a few seconds. As always, inspect the test logs
and determine whether the results are satisfactory.
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.70. gettext (final-system-components phase)
For an overview of gettext, see gettext.
One of the tests — lang-gawk
— fails, perhaps because gawk is a
later version than the tests expect. The issue is that the test expects
the string "EUR remplace FF." but instead gets "FF is replaced by EUR."
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.71. pcre2
Name |
Perl Compatible Regular Expressions |
---|---|
Version |
10.37 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
PCRE is a library of functions that implement regular expression pattern matching using the same syntax as Perl 5. It’s used by a bunch of other packages to provide regular expression capabilities.
There are two versions of the PCRE library: PCRE and PCRE2. The former is the original version, and is the one that is most commonly used by other packages. Its API and feature set are stable; only bugfix releases are made at this point.
This version, PCRE2, was first released in 2015; the project maintainers urge people who want to use PCRE to use that version. Many of the packages that are part of CBL still use the older version, so it’s present as part of the CBL system, but modern versions of git now work with PCRE2, so it is built as well.
There are several optional dependencies that provide enhanced capabilities in PCRE; these are already part of CBL, so we go ahead and enable them.
./configure --prefix=/usr --enable-jit --enable-pcre2-16 \
--enable-pcre2-32 --enable-pcre2grep-libz --enable-pcre2grep-libbz2 \
--enable-pcre2test-libreadline
make
The tests don’t work right in the partial CBL build, possibly because of missing dependency flags.
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.72. xmlto
Name |
XML-to-any converter |
---|---|
Version |
0.0.28 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
xmlto is a front-end wrapper script that can run other programs to convert XML documents into a variety of output formats — man pages, HTML, XHTML, PDF, and others. It is basically a convenience for people who don’t want to have to remember or look up how to invoke programs like xsltproc, FOP, and passivetex.
This is basically an orphaned project; as of this writing, in 2019, it has had no commits nor releases since 2015.
./configure --prefix=/usr
make
make check
make install
17.3.73. docbook-xsl
Name |
DocBook XSL Stylesheets |
---|---|
Version |
1.79.2 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
DocBook is an XML markup language for technical documentation. It lets you create document content in a presentation-neutral form; you can then use programs to transform that content into a variety of formats. It’s formally defined by a "schema," with several different versions available.
This package — DocBook XSL — is a set of XSLT stylesheets. These stylesheets define rules by which DocBook XML documents can be transformed into output formats intended to be used by people: HTML (single-page and section-per-page), XHTML, XSL-FO (an intermediate form that can then be transformed by other programs into PDF or other formats), man pages, WebHelp, and probably dozens of other things.
As of May 2017, you can learn a lot more about DocBook XSL by reading a
book available online at
http://www.sagehill.net/docbookxsl/index.html
.
There are two different distributions of XSL stylesheets available from
the docbook-xsl project: the standard docbook-xsl
tar file has a
DocBook namespace prefix present in element names within the stylesheets
and is intended for use with DocBook 5. A second form without the
namespace prefix, intended for use with DocBook 4, is distributed as
docbook-xsl-nons
.
(Up through version 1.79.1 of this package, the version without a
namespace was the default and available in files with names like
docbook-xsl-1.79.1.tar
; the version with a namespace was available in
files like docbook-xsl-ns-1.79.1.tar
.)
By preference, we wouldn’t use either of those distribution files here! It would be far better to start with the source tree found in the upstream git repository, and construct the XSL stylesheets using the Makefiles found there. However, as of December 2019, I can’t get it to work. Maybe someday!
Since we’re not building the stylesheets, there’s no configuration, compilation, or testing steps.
(none)
(none)
(none)
There’s no automated installation process for this package, either. We’re just going to copy the package files to a reasonable system location.
mkdir -p /usr/share/xml/docbook-xsl-${version}
cp -v -R * /usr/share/xml/docbook-xsl-${version}
After installing the files, update the XML catalog so that they will be
easy to find. There are two different sets of URLs we’re specifying
here: one, with a docbook.sourceforge.net
hostname, will handle
documents that are using the old canonical stylesheet URLs from version
1.79.1 and earlier; the other, at cdn.docbook.org
, is the modern
location for these stylesheets.
xmlcatalog --noout --add "rewriteSystem" \
"http://docbook.sourceforge.net/release/xsl-ns/${version}" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://docbook.sourceforge.net/release/xsl-ns/${version}" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteSystem" \
"http://docbook.sourceforge.net/release/xsl-ns/current" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://docbook.sourceforge.net/release/xsl-ns/current" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteSystem" \
"http://cdn.docbook.org/release/xsl/${version}" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://cdn.docbook.org/release/xsl/${version}" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteSystem" \
"http://cdn.docbook.org/release/xsl/current" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
xmlcatalog --noout --add "rewriteURI" \
"http://cdn.docbook.org/release/xsl/current" \
"/usr/share/xml/docbook-xsl-${version}" /etc/xml/catalog
17.3.74. git (final-system-components phase)
For an overview of git, see git.
This is a nearly-complete build of the git version control system,
including optional components and documentation. Since there is no GUI
environment in the base CBL system, the gitk
and git-gui
components
are not built here. (The info version of the documentation is also not
built.)
The documentation for git is maintained in AsciiDoc format; this can be
processed either by the package of that name or, by using an additional
build flag, by the newer Asciidoctor package. After being transformed
into XML by AsciiDoc or Asciidoctor, the xmlto
program from that
package is used for further processing, using xsltproc
, the DocBook
XSL stylesheets, and the DocBook XML DTD. That means that building the
documentation pulls in a fairly large stack of dependencies! I think
it’s worth it, though — first, because git is such an elaborate program
with so many capabilities, it’s important to have the documentation
readily available; and, second, because AsciiDoc is also the litbuild
documentation output format, so if you want to generate a version of the
CBL build story that’s easiest for humans to read and learn from, you
need it anyway.
- Dependencies
The documentation can be rendered into several formats, primarily HTML,
manual pages, and texinfo. The build system uses different components
depending on which of these you want to produce: HTML requires only an
AsciiDoc processor, but manual pages are generated using a multi-stage
process: first, the AsciiDoc source is transformed into DocBook XML, and
then the xmlto
program is used to generate the manual pages.
- Dependencies
The xmlto package is old and appears to be orphaned but is still in use in the git documentation build process.
It would be worthwhile to update the documentation build process for git
to eliminate the xmlto dependency — xmlto appears to be just a wrapper
around the xsltproc
and the docbook XSL stylesheets, so those could
use be directly instead — but that’s more work than I’ve been able to
put in to the challenge as yet.
To produce a texinfo version of the documentation, yet another package is needed: docbook2X (http://docbook2x.sourceforge.net/), which is also apparently orphaned and has not had a release since 2007. Here, we are not bothering to produce the info documentation.
The build system is designed to use the Python AsciiDoc package, but can
be told to use Asciidoctor instead with the USE_ASCIIDOCTOR
flag. I’m
not sure whether that flag needs to be set at configure time, build
time, or both. It doesn’t hurt anything to set it in both locations. It
actually seems like it’s important to specify it at installation time,
as well, which doesn’t make a huge amount of sense to me but whatever.
USE_ASCIIDOCTOR=YesPlease ./configure --prefix=/usr --sysconfdir=/etc \
--with-libpcre2 --with-curl --with-expat
USE_ASCIIDOCTOR=YesPlease make all
USE_ASCIIDOCTOR=YesPlease make doc
Some of the git tests fail in the partial environment.
make test || echo "Exit code $?: continuing anyway"
The installation routine for git has something weird going on: it creates a tar file with locale data, then tries to expand that tar file into the standard filesystem location — which promptly fails because tar tries to modify directory ownership and permissions to match what is recorded in the tar file, and package users are not allowed to do this.
To work around this, we initially install with a DESTDIR
set to a new
directory; then we copy everything from that directory to the final
system locations.
tempgitdir=$(mktemp -d)
USE_ASCIIDOCTOR=YesPlease make DESTDIR=$tempgitdir install
USE_ASCIIDOCTOR=YesPlease make DESTDIR=$tempgitdir install-doc
USE_ASCIIDOCTOR=YesPlease make DESTDIR=$tempgitdir install-html
pushd $tempgitdir
cp -d -f -R -v --parents usr /
popd
rm -rf $tempgitdir
17.3.75. ninja
Name |
Ninja |
---|---|
Version |
1.10.2 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
17.3.75.1. Overview
Ninja is a relatively new build tool, designed with the expectation that the files used to drive it are generated by other, higher-level build systems. Its primary design goal is to be as fast as possible.
17.3.75.2. ninja (final-system-components phase)
(none)
There is a bootstrap process that uses a python script to compile a ninja program just by compiling all non-test source code files, and then uses that program to rebuild itself.
./configure.py --bootstrap
./ninja ninja_test
./ninja_test
There is no installation routine for ninja; the only thing that needs to
be done to install the program is to copy the ninja
program to one of
the standard system binary directories.
There are emacs and vim editor files that can facilitate editing ninja build files, but I’m not installing those here.
cp ninja /usr/bin
17.3.76. meson
Name |
The Meson Build System |
---|---|
Version |
0.58.2 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
17.3.76.1. Overview
Meson is a build system implemented in python 3. Its main goals are to be very fast and user-friendly.
Meson is a high-level build system that, when used on Unix-ish systems, runs on top of the Ninja build tool. (According to its homepage, meson can also generate project files for Microsoft Visual Studio and the MacOS Xcode development environment.)
17.3.76.2. meson (final-system-components phase)
As is typical for Python programs, there is no configuration stage.
(none)
python setup.py build
The meson package does include some automated tests, but they do not work for me — probably because of missing dependencies.
(none)
python setup.py install
17.3.77. pcre
Name |
Original Perl Compatible Regular Expressions |
---|---|
Version |
8.45 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
Dependencies |
PCRE is a library of functions that implement regular expression pattern matching using the same syntax as Perl 5. It’s used by a bunch of other packages to provide regular expression capabilities.
There are two versions of the PCRE library: PCRE and PCRE2. This blueprint is for PCRE, also known as PCRE1, which is still commonly used by other packages. Its API and feature set are stable; only bugfix releases are made at this point.
Since the package maintainers continue to support the original PCRE (and refer to it as PCRE, and to the newer version as PCRE2), CBL uses those names for its blueprints.
There are several optional dependencies that provide enhanced capabilities in PCRE; these are already part of CBL, so we go ahead and enable them.
-
pcre-8.45-sljit_mips-label-statement-fix.patch
In some scenarios, a syntax error in a MIPS-specific file causes problems. I don’t remember exactly where I found this fix, but it worked.
./configure --prefix=/usr --enable-jit --enable-pcre16 \
--enable-pcre32 --enable-utf --enable-unicode-properties \
--enable-pcregrep-libz --enable-pcregrep-libbz2 \
--enable-pcretest-libreadline
make
make check
make install
17.3.78. glib
Name |
GLib |
---|---|
Version |
2.68.3 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Patches |
|
Dependencies |
17.3.78.1. Overview
GLib is a collection of functions used widely in other software projects. It was originally part of the GIMP Toolkit library ("GTK" or "GTK+"); some of the functions in the GIMP Toolkit were useful outside of a GUI context, so the project maintainers decided to split those functions into separate libraries. The result of that refactoring eventually resulted in GLib.
GLib provides implementations of several useful data structures, string utilities, concurrency primitives, object-orientation (through the GLib Object System), a virtual filesystem (through GIO), and similar low-level features.
-
glib-2.68.3-fix-close-range-call-1.patch
When built against glibc 2.34, the glib build fails because it invokes a
kernel function close_range
with the wrong number of arguments. This
fix is from the glib upstream repository.
As of GLib version 2.60, the build system has changed from the GNU build system to meson.
- Dependencies
17.3.78.2. glib (final-system-components phase)
- Dependencies
meson --prefix /usr --buildtype release --sysconfdir /etc _build
ninja -C _build
Some of the tests fail.
ninja -C _build test || echo "Exit code $?: continuing anyway"
ninja -C _build install
rm -rf .cache .dbus-keyrings .local
17.3.79. nettle
Name |
GNU Nettle cryptographic library |
---|---|
Version |
3.6 |
Project URL |
|
SCM URL |
|
Download URL |
|
Dependencies |
Nettle is a low-level cryptographic library, like libgcrypt. It’s present in CBL because some packages depend on it.
./configure --prefix=/usr
make
make check
make install
17.3.80. libtasn1
Name |
GNU Libtasn1 |
---|---|
Version |
4.17.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
GNU Libtasn1 provides a library and some utility programs for parsing and otherwise dealing with Abstract Syntax Notation One (ASN.1) and Distinguished Encoding Rules (DER). These are used TLS certificates and in other things related to cryptography; some of the CBL packages are more useful if libtasn1 is available.
./configure --prefix=/usr
make
make check
make install
17.3.81. gnutls
Name |
GnuTLS Library |
---|---|
Version |
3.6.14 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
17.3.81.1. Overview
GnuTLS is, like OpenSSL and LibreSSL, an implementation of the SSL and TLS protocols and related things. It’s a part of the GNU project.
Sometimes — as with version 3.6.7.1 — the gnutls distribution file expands to a directory that doesn’t match the expected name. In those cases, of course the tar file needs to be repackaged using the standard convention.
Since the GnuTLS package does not conflict with LibreSSL or OpenSSL, it
is installed into the standard /usr
location, where it is easy for
other packages to find it.
If desired, the libunistring library can be installed separately rather than using the one bundled with GnuTLS. In most cases I prefer to install standalone versions of libraries; in this case, since I don’t otherwise need libunistring, I just use the one that comes with GnuTLS.
If PKCS #11 support is desired, the p11-kit package should be installed before GnuTLS, and the configure directive about it can be removed.
GnuTLS provides support for secure DNS if the libunbound package at http://unbound.net/ is available. If that matters to you, you might want to set that up first.
17.3.81.2. gnutls (final-system-components phase)
./configure --prefix=/usr --with-included-unistring \
--without-p11-kit --enable-ssl3-support --disable-ssl2-support \
--enable-openssl-compatibility
make
The test suite sometimes hangs in the partial environment, for reasons I haven’t discovered.
(none)
make install
17.3.82. grep (final-system-components phase)
For an overview of grep, see grep.
After upgrading to glibc 2.28, the automated test suite for grep started crashing with one unexpectedly-passing test. This might be because some long-standing conformance bugs were fixed in 2.28. It’s a good idea to review the test logs to see if they pass clean after upgrading this package or glibc again.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
make install
17.3.83. gzip (final-system-components phase)
For an overview of gzip, see gzip.
Some of the gzip tests fail. These usually aren’t very important ones, but you can inspect the test logs after the build is complete.
./configure --prefix=/usr
make
make check || echo "Exit code $?: continuing anyway"
make install
17.3.84. kbd (final-system-components phase)
For an overview of kbd, see kbd.
./configure --prefix=/usr --disable-vlock --enable-optional-progs
make
The documentation doesn’t get installed by default, so we have to do that ourselves.
make check
make install
mkdir /usr/share/doc/kbd
cp -R -v docs/doc/* /usr/share/doc/kbd
17.3.85. linux (final-system-components phase)
For an overview of linux, see linux.
As mentioned earlier, the kernel build process requires a TLS library.
- Dependencies
The kernel build this time is similar to the one we did with the
cross-toolchain. We start with the kernel configuration we had before;
because we enabled the IKCONFIG
and IKCONFIG_PROC
options, we can
get it from the /proc
filesystem.
make mrproper
cat /proc/config.gz | gzip -d > .config
make olddefconfig
This time we do want the kernel to auto-mount the devtmpfs
at /dev
while booting — that saves us the trouble of doing it, and this time we
will actually have a /dev
directory at boot time!
./scripts/config --enable DEVTMPFS_MOUNT
The Linux kernel supports a feature called "control groups." These provide the ability to group together sets of processes and control those groups' use of system resources. This is one of the main features used by programs like Docker and LXC to define and manage "containers," which are basically lightweight virtual machines. We want the target system to support containers, so we enable control groups.
./scripts/config --enable PAGE_COUNTER
./scripts/config --enable MEMCG
./scripts/config --enable MEMCG_SWAP
./scripts/config --enable MEMCG_SWAP_ENABLED
./scripts/config --enable BLK_CGROUP
./scripts/config --enable DEBUG_BLK_CGROUP
./scripts/config --enable CGROUP_WRITEBACK
./scripts/config --enable CFS_BANDWIDTH
./scripts/config --enable RT_GROUP_SCHED
./scripts/config --enable CGROUP_PIDS
./scripts/config --enable CGROUP_RDMA
./scripts/config --enable CGROUP_HUGETLB
./scripts/config --enable CGROUP_DEVICE
./scripts/config --enable CGROUP_PERF
./scripts/config --enable CGROUP_DEBUG
The other kernel feature that is required for containers is called
"namespaces." Namespaces allow different sets of processes to perceive
parts of the system differently. For example, the "PID" namespace allows
more than one process to have the same PID, as long as they are in
different namespaces. Each container is given its own PID namespace, so
they can each have their own PID 1 running an init
process.
./scripts/config --enable PID_NS
We also enable the use of "overlay" filesystems, which allow new filesystems to be layered on top of an existing filesystem; this allows multiple filesystems to share an underlying base filesystem without duplicating its contents.
./scripts/config --enable OVERLAY_FS
The kernel has support for detecting some kinds of buffer overflow attacks. This is a good idea if you ever might run malicious code on your computer.
./scripts/config --enable CC_STACKPROTECTOR_REGULAR
Linux also supports virtual machine acceleration on CPUs that provide
those features. As with the x32 ABI earlier, this is
architecture-specific; when the options are irrelevant, the
olddefconfig
Makefile target will simply remove them.
./scripts/config --module KVM
./scripts/config --module KVM_INTEL
./scripts/config --module KVM_AMD
./scripts/config --module VHOST
./scripts/config --module VHOST_NET
You might want to run make nconfig
and browse through all the options
to customize the kernel further. Minimally, it’s a good idea to ensure
that your kernel will include support for all your hardware! But this is
pretty good for a basic starting point.
make olddefconfig
Now we can build the kernel and modules.
make all
make Image.gz
(none)
To install modules, it’s easiest to use the Makefile target. This
automatically installs the modules to a subdirectory of /lib/modules
,
named with the kernel version and LOCALVERSION
— which is where the
module utilities will look for them.
make modules_install
To install the kernel itself, we can again simply copy files where we
want them instead of using the install
target. We can get the kernel
release name by looking at the subdirectory that was just installed
under /lib/modules
. (There may be some easier way to do this, but if
there is, I don’t know it.)
export KERNELRELEASE=$(ls -1t /lib/modules | head -n 1)
export KERNELPATH=$(find . -name Image.gz -a -type f)
cp -v $KERNELPATH /boot/kernel-$KERNELRELEASE
cp -v .config /boot/config-$KERNELRELEASE
lzip -9 /boot/config-$KERNELRELEASE
cp -v System.map /boot/System.map-$KERNELRELEASE
depmod -e -F /boot/System.map-$KERNELRELEASE -a $KERNELRELEASE
17.3.86. lzip (final-system-components phase)
For an overview of lzip, see lzip.
./configure --prefix=/usr
make
make check
make install
17.3.87. make (final-system-components phase)
For an overview of make, see make.
This is an almost completely standard build using the GNU build system. Really, it would be surprising if it weren’t!
./configure --prefix=/usr
make
The automated tests for make are implemented in Perl, and rely on a test
driver program present in the project sources. In perl 5.26, this breaks
as described at
https://lists.gnu.org/archive/html/bug-make/2017-03/msg00040.html
. To
work around the issue, we can set an environment variable as described
there.
Some tests sometimes fail with timeout errors, as well; we can proceed anyway as usual.
PERL_USE_UNSAFE_INC=1 make check || echo "exit code $?, proceeding anyway"
make install
17.3.88. openssl
Name |
OpenSSL |
---|---|
Version |
1.1.1k |
Project URL |
|
SCM URL |
git://github.com/openssl/openssl |
Download URL |
|
Dependencies |
17.3.88.1. Overview
OpenSSL is a set of libraries and command-line utilities for cryptography and the Transport Layer Security (TLS) and obsolete Secure Sockets Layer (SSL) protocols. It’s used by a large number of other packages for TLS support.
The OpenSSL source code has a long and convoluted history — it was originally based on a fork of a project called SSLeay, after that project was abandoned by its authors — and is complex enough that it has historically been buggy. Frequent and severe security vulnerabilities have been found in it. That’s why members of the OpenBSD project forked it to a new "LibreSSL" project, with the intention of cleaning up the codebase. LibreSSL and GnuTLS are generally preferred for CBL. However, it’s possible that you’ll find some package that does not work with LibreSSL, so it’s worthwhile to have OpenSSL around to support those.
To avoid conflicting with the header and library files from LibreSSL,
the OpenSSL package is installed to /opt
rather than to the standard
/usr
system location. To cause a package to compile and link against
OpenSSL, you’ll need to specify CFLAGS
including
-I/opt/openssl/include
and LDFLAGS
including -L/opt/openssl/lib
.
It’s also a really good idea to build these packages with an RPATH
or
RUNPATH
that specifies the OpenSSL lib
directory, so that its shared
library files are used at runtime rather than those of LibreSSL. That
usually means adding -Wl,-rpath,/opt/openssl/lib
to the LDFLAGS
.
17.3.88.2. openssl (final-system-components phase)
./config --prefix=/opt/openssl --openssldir=/opt/openssl \
threads zlib shared
make
The automated tests hang at this point in the CBL build. You can rebuild with tests as usual with Rebuild the packages whose tests could not be run.
(none)
make install
17.3.89. patch (final-system-components phase)
For an overview of patch, see patch.
This is a standard GNU build system build.
./configure --prefix=/usr
make
make check
make install
17.3.90. pixman
Name |
pixman library |
---|---|
Version |
0.40.0 |
Project URL |
|
SCM URL |
git://anongit.freedesktop.org/pixman |
Download URL |
17.3.90.1. Overview
Pixman is a library for pixel manipulation. It is used by the Cairo graphics library and the X server; it’s also a required dependency when building the QEMU emulator.
17.3.90.2. pixman (final-system-components phase)
This is a perfectly standard GNU build system package.
./configure --prefix=/usr
make
make check
make install
17.3.91. popt (final-system-components phase)
For an overview of popt, see popt.
One of the tests doesn’t reliably pass.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?, continuing anyway"
make install
17.3.92. qemu
Name |
QEMU |
---|---|
Version |
6.0.0 |
Project URL |
|
SCM URL |
(unknown) |
Download URL |
|
Dependencies |
17.3.92.1. Overview
QEMU is a computer emulator and virtualizer. It’s pretty important in the CBL process, so I’m going to spend some time on describing what that all means — as with everything else in this book, feel free to skip ahead if you’re not interested enough to get into the details; it won’t hurt my feelings.
Let’s start with the basics: Generally, in the realm of computing, the word virtual means that something doesn’t really exist, but it can be used as though it does exist, and physical means that something actually does exist.
A common example of this is memory: the actual physical RAM in your computer can be used to store data. Since RAM is relatively scarce on most systems, it’s often the case that people don’t have enough memory in their computers to store all the data needed by all the programs they want to run. When that happens, all modern operating systems provide the ability to use virtual memory as well as actual physical memory. When a program asks the operating system for some memory it can use to store data, and the operating system knows that all the RAM in the computer has already been allocated, it finds some chunks of RAM that haven’t been accessed recently and writes the data from them out to a hard drive or similar persistent storage device, and then provides the chunks of RAM that have just been freed up to the requesting program. When and if a program needs data that was written out to persistent storage this way, the operating system finds other chunks of RAM that haven’t been accessed in a while, writes those chunks out to the persistent storage volume, reads the chunks of data that are now needed and stores them in the freed-up RAM, and provides them to the program that needs them.footnote[This process is often called "swapping" because chunks of data are being swapped back and forth between RAM and persistent storage. The area of the persistent storage volume used for this purpose is sometimes called "swap space" or a "swap file." If you’ve ever wondered what those things are, now you know!]
The data that was written to persistent storage to free up physical RAM is called "virtual memory," because it works like physical memory but doesn’t really exist. The only drawback to this approach is that persistent storage is several orders of magnitude slower than RAM, so when you are actively using programs that, collectively, need more memory than you have in your computer, you will sometimes find that everything gets really slow.
That’s all really a digression. The important thing to understand is that physical resources actually exist as hardware, and virtual resources do not: some program or operating system is just pretending that they exist.
!!!!!!! TODO flesh this section out a bunch !!!!!!!!
As mentioned elsewhere, every computer has a particular architecture — determined by the CPU within it — and understands a specific set of instructions
TODO REWRITE ME It’s a fairly full-featured emulator: for each machine architecture that it is built to support, it can both run individual userspace programs and emulate full virtual computer systems.
TODO REWRITE ME When QEMU is running an i686- or x86_64-architecture virtual machine on a host of the same architecture that has appropriate virtualization instructions in the CPU, it can take advantage of the KVM acceleration capability provided by the Linux kernel. This is the most common use of QEMU — to provide cloud-style virtual machines on powerful hypervisor servers. That’s not what we’re doing here, though. Instead, we use QEMU so that we have an environment where our target system can run even when we don’t have a spare computer with a different machine architecture lying around. Also, since QEMU provides the same virtualized hardware everywhere, it lets us be a lot more confident that there won’t be any weird idiosyncracies with the target system that will prevent the normal CBL process from working.
17.3.92.2. qemu (final-system-components phase)
In order for CBL to be natively self-hosting, we need to have QEMU available in the final target system. As a pleasant consequence, we’ll then be able to do new CBL builds on the resulting system, targeting other architectures, on the base CBL system without worrying about the Trustworthy Host-System Programs.
./configure --prefix=/usr --cc=gcc --localstatedir=/var \
--enable-gnutls --enable-gcrypt --enable-kvm
make
The automated test suite is problematic in the initial target system.
(none)
make install
17.3.93. rsync (final-system-components phase)
For an overview of rsync, see rsync.
./configure --prefix=/usr --sysconfdir=/etc --with-included-popt=no \
--with-included-zlib=no --disable-xxhash --disable-zstd \
--disable-lz4
make
make check || echo "Exit code $?, continuing anyway"
make install
17.3.94. sed (final-system-components phase)
For an overview of sed, see sed.
./configure --prefix=/usr --docdir=/usr/share/doc/sed
make
make html
Starting with sed 4.5, I get an error in the inplace-selinux.sh
test;
it complains that "CONFIG_HEADER" is not defined. Everything else works,
and I’m not worrying about selinux until I have a basic system working,
so for now we can just ignore the error and proceed.
make check || echo "exit code $?, proceeding anyway"
make install
make install-html-am
A few scripts that were built and installed before the final system sed is set up have the scaffolding sed path baked into them. Now that we have the real sed available, we can modify those scripts so they know where to find it.
root
) commands:pushd /usr/bin
grep -l /scaffolding/bin/sed * | while read FILE; \
do \
sed -i -e 's@/scaffolding/bin/sed@/usr/bin/sed@g' $FILE; \
done
popd
17.3.95. shadow (final-system-components phase)
For an overview of shadow, see shadow.
This is built pretty much the same as for the scaffolding version.
The shadow package provides groups
and nologin
programs. There are
also versions of these programs provided by GNU coreutils (for groups
)
and util-linux (for nologin
); we only want one of each, so we disable
the ones provided by this package.
The man-pages package includes a couple of pages that are also distributed as part of the shadow package; generally, in these cases, we prefer the version distributed along with the package.
root
) commands:rm -f /usr/share/man/man3/getspnam.3 \
/usr/share/man/man5/passwd.5
Some configuration files in /etc
were created by the scaffolding
version of this package, and are owned by root. The installation process
for shadow will want to install them again, so they need to be owned by
the package user during installation.
root
) commands:chown shadow:shadow /etc/login.defs \
/etc/limits \
/etc/login.access
sed -i src/Makefile.in -e 's/groups$(EXEEXT) //' \
-e 's/= nologin$(EXEEXT)/= /'
find man -name Makefile.in -exec \
sed -i -e 's/man1\/groups\.1 //' -e 's/man8\/nologin\.8 //' '{}' \;
./configure --prefix=/usr --sysconfdir=/etc \
--with-group-name-max-length=32
make
make check
make install
The default password hashing method is DES, which is very weak and only allows passwords up to 8 characters. For CBL we switch to a SHA-512 hash.
sed -i -e 's@#\(ENCRYPT_METHOD \).*@\1SHA512@' /etc/login.defs
It’s also convenient to create home directories for new users automatically.
sed -i -e 's@#\(CREATE_HOME \).*@\1YES@' /etc/login.defs
Using the package-users scheme, UIDs and GIDs starting at 9999 are
reserved for package users, so we need to adjust login.defs
to
restrict normal users to have UIDs only from 1000 to 9998 rather than
the default 60000 upper bound. A more reasonable place to do this is in
the package-users installation, but the installation here overwrites
anything we do there.
sed -i -e '/^[UG]ID_MAX/s@60000@ 9998@' /etc/login.defs
The files we chown’ed to the package user earlier should be owned by root again.
root
) commands:chown root:root /etc/login.defs \
/etc/limits \
/etc/login.access
Some of the programs distributed in the shadow package need to be setuid root. The presumption made in CBL is that the programs that shadow installs as setuid should all be setuid root — if that isn’t the policy you want to use, modify this as you see fit!
root
) commands:find /bin /usr/bin /sbin /usr/sbin -user shadow -a -perm -4000 | \
while read PROGRAM; \
do \
chown root $PROGRAM; \
chmod 4755 $PROGRAM; \
done
And we actually want to use shadow passwords, so we need to run the conversion programs provided by this package:
root
) commands:test -f /etc/shadow || pwconv
test -f /etc/gshadow || grpconv
The shadow password utilities allow for subordinate user and group
identifiers. These allow normal (non-root) users to use the newuidmap
and newgidmap
commands to configure user IDs and group IDs,
respectively, within a user namespace.[12] These
programs use the files /etc/subuid
and /etc/subgid
to determine
specifically how users are allowed to do this. These file are not
currently created by the shadow package’s installation routine, so we
should ensure they exist.
root
) commands:touch /etc/subuid /etc/subgid
Many of the files under /etc
should be tracked in the configuration
file repository — including all of the files we’ve already adjusted.
-
/etc/default/useradd
-
/etc/group
-
/etc/gshadow
-
/etc/login.defs
-
/etc/passwd
-
/etc/shadow
-
/etc/subgid
-
/etc/subuid
17.3.96. tar (final-system-components phase)
For an overview of tar, see tar.
In the latest version of tar, one of the automated tests fails:
difflink.at
, a test that involves hard-linked symbolic links. The
expected output is apparently a/z: Not linked to a/y
but instead the
tar command emits a/y: Not linked to a/z
. I don’t think that’s
worrisome enough to abort the build. As with other packages where test
failures are ignored, it’s a good idea to review the test logs to see
whether you have other, more problematic, issues.
./configure --prefix=/usr
make
make -k check || echo "Exit code $?: continuing anyway"
The documentation doesn’t get installed by default, so we need to do that separately.
make install
make -C doc install-html
17.3.97. texinfo (final-system-components phase)
For an overview of texinfo, see texinfo.
The test suite hangs, so we will skip it for now. Texinfo is in the Rebuild the packages whose tests could not be run section, so it’s easy to rebuild it with the full test suite once the system is fully built.
./configure --prefix=/usr
make
(none)
make install
Some components of texinfo should be installed into a directory that
will be used by TeX and METAFONT (conventionally, this is called
texmf
, which is an abbreviation of "TeX and METAFONT") if those
programs will ever be installed on the system. In CBL, we presume that
TeX will eventually be installed, so we go ahead and install those
components.
(TeX and METAFONT are the fundamental components in a typesetting system written by Donald Knuth, initially so that his magnum opus The Art Of Computer Programming could be typeset properly.)
make TEXMF=/usr/share/texmf install-tex
17.3.98. vim (final-system-components phase)
For an overview of vim, see vim.
The test suite for vim is really noisy — a lot of terminal controls are
used. The tests also do not work properly when run non-interactively, as
far as I can tell — at least, I have not been able to get them to work.
If you want to run the automated tests for vim, wait until the base
system is complete and running, and then manually configure and build
(using the configure_commands and compile_commands functions); then
unset MAKEFLAGS
to disable parallelism, and then manually run make
scripttests
(which will conclude with TEST FAILURE
if there were
errors and ALL DONE
if not) and then make unittests
(which will
compile some test programs and then run them; if the test programs run
successfully, you’ll see the line passed
after each of the test
programs is executed).
./configure --prefix=/usr
make
(none)
A lot of people who have been using Unix systems for a long time
habitually invoke it as vi
rather than vim
. To make life slightly
easier for those people, we set up a symbolic link.
make install
ln -s vim /usr/bin/vi
We also need to create a basic configuration file. The only thing we
specify in it by default is "nocompatible" mode, which makes vim
behave less like the original vi
editor in a number of useful ways.
You might want to make additional changes to suit your preferences!
echo "set nocompatible" > /etc/vimrc
17.3.99. yoyo
Name |
yoyo program runner |
---|---|
Version |
0.99.5 |
Project URL |
|
SCM URL |
|
Download URL |
17.3.99.1. Overview
Sometimes programs crash or hang due to ephemeral or transient issues.
In situations where this happens — one specific case is the target side
of the Cross-Building Linux process, when running QEMU to perform the
final system build — it can make sense to retry the program some number
of times. That’s what yoyo
does.
yoyo runs the rest of its own command line as a new subprocess. If the subprocess terminates with an error status, yoyo restarts it. If the subprocess appears hung — in other words, all threads are sleeping, and both user and system CPU time is increasing very slowly if at all, over the course of several minutes — yoyo terminates and restarts it.
After yoyo has restarted a hung or crashed process five times, it gives up and terminates itself.
17.3.99.2. yoyo (final-system-components phase)
(none)
make
make check
cp -a -v build/yoyo /usr/bin
18. Complete the CBL System
We have all the system components in place now, so we can set up the s6-based init system that will launch userspace and configure a boot loader.
18.1. Set Up Networking For the Target System
Networking is a really complicated topic and there is no way to cover it comprehensively here! We’re only going to cover a few of the absolute basics: what kind of networking hardware the target system has, and how to configure that hardware so the target system is accessible over the network.
The default assumption made by CBL is that the target system will
perceive itself as having a wired Ethernet interface; it will have a
dynamically-configured IP address; and that networking should be
enabled at the conclusion of the CBL build. If any of those isn’t the
case, override the ENABLE_TARGET_NETWORK
parameter — the
network-related services will still be defined, but not made part of the
rl-default
standard system run-level bundle.
18.1.1. Network Hardware
There are a few types of network hardware you might be dealing with.
18.1.1.1. Wired Ethernet
This is the simplest situation you can possibly be in. To get the target system on the network, all you need to do is ensure that a driver for your network hardware is compiled into the Linux kernel.[13] Easy-peasy!
18.1.1.2. Wireless
If your target system has only a wireless network interface, you need additional software that is not part of the base CBL system. The simplest set of packages you can add is:
-
wireless_tools — provides programs like
iwconfig
that control wireless network hardware. Hosted at https://hewlettpackard.github.io/wireless-tools/Tools.html -
wpa_supplicant — allows your computer to connect to WPA and WEP-encrypted wireless networks. Hosted at https://w1.fi/wpa_supplicant/
As of this writing, this is beyond the scope of CBL.
18.1.1.3. QEMU Emulated
The term "network hardware" may be misleading, because there might not
be any hardware involved! If the target system is a QEMU virtual
machine, all the hardware it perceives is actually being simulated by
the QEMU program. In this case, you configure what kind of hardware QEMU
will simulate by using command-line arguments to the qemu-system
program. Then all you have to do is ensure that the Linux kernel running
on the virtual machine has support for whatever hardware you’ve told
QEMU to simulate.
The way that I like to set this up — because it’s the easiest thing for me to think about — is to give the virtual machine a TAP interface that is connected to a bridge interface on the host system where the virtual machine is running.
This is all described more fully in [setup-virtual-network], which you can also use on the host system to construct an entire virtual network usable by any number of virtual machines simultaneously.[14]
You might also be interested in the [enable-virtnet-internet] blueprint, which sets up the virtual network so that the virtual machines connected to it can reach the actual network the host system is on, and thereby reach the Internet. But while working on CBL — especially when writing or testing package blueprints — I don’t do that. There are a lot of build systems that automatically fetch source code or binaries for other programs or libraries from locations on the Internet. Since my goal with CBL is to make it screechingly obvious what code is present in the system, and to enable the entire system to be built from a known repository of source code without anything extra being needed, this (normally helpful and convenient) technique is exactly counter to my purposes. Therefore, while I like having the CBL system be able to reach the host system (so I can fetch software packages from it, for example), I explicitly do not want it to be able to reach the Internet.
18.1.2. Configuring Networking For the Target System
Once you have a functioning network interface (real or emulated), the other thing you need to do is ensure that interface has an IP address. There are two ways you can do this: static and dynamic.
18.1.2.1. Static Network Configuration
If you know an available IP address on the local network, you can simply
assign that IP address to the interface. This is most conveniently done
at system startup time, so if you want to do this you can set up an
s6-rc service definition directory, maybe called eth0-network
,
dependent on mount-sys
and mount-proc
, with an up
script that does
something like:
if { s6-echo "Starting network" } if { ip link set eth0 up } ip addr add ${TARGET_IP_ADDRESS} dev eth0
and a corresponding down
script something like:
if { s6-echo "Shutting down network" } if { ip addr flush dev eth0 } ip link set eth0 down
Then make that service the active one in the network
bundle so that it
runs automatically as part of the network-services
bundle, and so that
other network-related services will depend on it.
18.1.2.2. Dynamic Network Configuration
In most cases, static networking isn’t the best option — these days, computers are more often portable than not, and even when they’re always in the same place they may have multiple routes to the Internet. That means they often wind up on different networks at different times, which in turn means that dynamic network configuration is usually the best approach. This is done with the Dynamic Host Configuration Protocol (DHCP).
There are several different programs that implement the client side of
DHCP. I have used dhclient
, provided as part of the DHCP package from
the Internet Systems Consortium, and dhcpcd
by Roy Marples; both work
fine. CBL uses dhcpcd
by default because it is slightly more congruent
with the way that supervised services are set up in CBL: both can be
told not to background themselves, but dhclient
always logs its
activities via the syslog
function, while dhcpcd
can be told to
write log messages to its standard error instead. (By default it also
writes them to syslog, but this can be redirected to /dev/null
).
18.1.3. libmnl
Name |
Minimalistic netlink library |
---|---|
Version |
1.0.4 |
Project URL |
|
SCM URL |
|
Download URL |
This library provides low-level functions to help userspace programs interact with the Linux kernel via netlink sockets; it takes care of plumbing details like constructing, validating, and parsing netlink headers. It doesn’t try to hide any of the details of actually using netlink sockets, so you still wind up writing some pretty low-level code if you use this library.
./configure --prefix=/usr
make
make check
make install
18.1.4. Set Up Network Availability Service
Some startup services — both long-running daemon processes and oneshot
initialization processes — need access to the network to function.
Long-running processes are typically set up to keep retrying until they
succeed, but — especially for oneshots — it’s inconvenient for those
services simply to depend on a network
service. When that resolves to
a DHCP client (as is the default configuration on CBL systems) the
service will be regarded as being up almost instantly — as soon as the
DHCP client program is running — but the network interface won’t
actually be configured until some time later. The delay may only be a
few seconds, but it can be inconvenient for dependent services to fail
until the network is actually available.
This blueprint installs a very simple program, await-default-route
,
and a oneshot s6-rc service that runs it. The program simply waits until
there is a default IPv4 network route — that is, a route that instructs
the kernel that network packets intended to be sent to other computers
should be sent through a specific network interface. If there is a
default route, then presumably the network is accessible through it! If
there is no default route, await-default-route
will block until that
situation changes.
Any other service that needs to be able to reach external services over
the network can depend on the network-available
oneshot service that
is set up here, and the s6-rc service manager will defer starting those
until the network-available
service completes.
We could implement this really easily by writing a script that just dumps the route table and looks to see if there’s a default route, but for that to work the script would have to "poll" — that is, check to see if there is a default route; if there is not, then sleep a little while and check again, and keep doing that until there is a default route.
Polling is often really easy, but it’s never the best thing to do: it wastes time and energy. Any time you can use an interrupt or notification-based mechanism, you should. That’s what we’re going to set up here: a tiny program that will be notified of all changes to the route table and blocks until it is told about the creation of a default route.
It uses a small library that eliminates some of the drudgery of using netlink.
- Dependencies
The source for this program is only a couple thousand characters in size. It doesn’t really make sense to package it in a tarfile, along with a Makefile and README and license text and so on, so we’ll just write it out here, then compile and install it using litbuild directives.
The only drawback of this approach is that the program will wind up being owned by root rather than by a package user. It’s easy enough to change that if you wish — just add commands that create a package user and so on.
I based this program on two example programs that are included in the
libmnl package: rtnl-route-event
and rtnl-route-dump
. Both of those
are in the public domain. Like the rest of the code in CBL, though, this
modified program is licensed under the GNU GPL.
As usual for C programs, this one starts by including header files that define the functions and macros it calls.
#include <stdlib.h>
#include <time.h>
#include <libmnl/libmnl.h>
#include <linux/rtnetlink.h>
Rather than fetching data from a netlink socket and processing it
directly, we’re going to use "callback" functions: we’ll pass function
references to mnl_cb_run
and trust that they’ll be invoked
appropriately.
Here we are setting up two callback functions. The first one,
route_dump_cb
, will be used to detect a default route that already
exists when the program is run, if there is one — that way, it can
terminate immediately rather than waiting (possibly forever) for a new
default route to be created.
A default route has "destination length" of 0 — I’m not sure exactly what "destination length" means, but by inspecting route table entries I observe that the only 0-length entry is the default route — and has other common characteristics: the default route is in the main routing table (which has ID 254), is a unicast route (type 1), and has universal scope (ID 0).
If this function is called with a payload that indicates any other route
table entry, it just returns MNL_CB_OK
. The program simply keeps
running when it gets that result.
static int route_dump_cb(const struct nlmsghdr *nlh, void *data)
{
struct rtmsg *rm = mnl_nlmsg_get_payload(nlh);
if (rm->rtm_dst_len == 0 && rm->rtm_table == 254 &&
rm->rtm_type == 1 && rm->rtm_scope == 0) {
exit(EXIT_SUCCESS);
}
return MNL_CB_OK;
}
The other callback we define will handle the more typical case, where no default route exists at the time the program is executed. It will inspect route table change events. Each of these either adds or deletes a route table entry; we ignore all deletion events and check creation events with the same logic as the earlier callback.[15]
static int route_event_cb(const struct nlmsghdr *nlh, void *data)
{
struct rtmsg *rm = mnl_nlmsg_get_payload(nlh);
if (nlh->nlmsg_type == RTM_DELROUTE)
return MNL_CB_OK;
if (rm->rtm_dst_len == 0 && rm->rtm_table == 254 &&
rm->rtm_type == 1 && rm->rtm_scope == 0) {
exit(EXIT_SUCCESS);
}
return MNL_CB_OK;
}
Now we can write the main routine for the program. The variables we need to define are two netlink sockets — one to dump currently-existing routes, the other to track change events — along with buffers that they’ll use to store data. There are also some other handy variables we need as well — I haven’t done any sort of analysis for what they’re for, they’re just used in the example programs I based this on.
The buffer size, 8192L
, is a kludge to get this program to be
compliant with the C89 standard. It should really be defined as being
MNL_SOCKET_BUFFER_SIZE
, but that leads to problems when compiling with
-std=c89
because it counts as a "variable length array" (which is not
soemthing that C89 supports).
Looking at the definition of MNL_SOCKET_BUFFER_SIZE
in
/usr/include/libmnl/libmnl.h
, it’s always defined to be 8192L
or
smaller. So we’re just declaring it to be as big as it might possibly
need to be.
int main(int argc, char *argv[])
{
struct mnl_socket *event_nl, *dump_nl;
char buf[8192L];
struct nlmsghdr *nlh;
struct rtmsg *rtm;
unsigned int seq, portid;
int ret;
We need to initialize and bind the event socket. I’m pretty sure that as soon as we do this, the kernel will start sending change events to this socket. By setting it up first, we can ensure that there’s no window between the time we inspect the already defined route entries and the time we start inspecting changes to route entries — if there were such a window, there would be a race condition bug that could cause us to miss the creation of a default route.
That might all be completely obvious to everyone but me, but that’s the kind of fiddly detail that I find easy to overlook. Concurrent programming is hard.
event_nl = mnl_socket_open(NETLINK_ROUTE);
if