diff --git a/documentation/poky-ref-manual/technical-details.xml b/documentation/poky-ref-manual/technical-details.xml index b34179533b..1657431495 100644 --- a/documentation/poky-ref-manual/technical-details.xml +++ b/documentation/poky-ref-manual/technical-details.xml @@ -151,11 +151,385 @@ By design, the Yocto Project builds everything from scratch unless it can determine that - a given task's inputs have not changed. - While building from scratch ensures that everything is current, it does also - mean that a lot of time could be spent rebuiding things that don't necessarily need built. + parts don't need to be rebuilt. + Fundamentally, building from scratch is an attraction as it means all parts are + built fresh and there is no possibility of stale data causing problems. + When developers hit problems, they typically default back to building from scratch + so they know the state of things from the start. + + Building an image from scratch is both an advantage and a disadvantage to the process. + As mentioned in the previous paragraph, building from scratch ensures that + everything is current and starts from a known state. + However, building from scratch also takes much longer as it generally means + rebuiding things that don't necessarily need rebuilt. + + + + The Yocto Project implements shared state code that supports incremental builds. + The implementation of the shared state code answers the following questions that + were fundamental roadblocks within the Yocto Project incremental build support system: + + What pieces of the system have changed and what pieces have not changed? + How are changed pieces of software removed and replaced? + How are pre-built components that don't need to be rebuilt from scratch + used when they are available? + + + + + For the first question, the build system detects changes in the "inputs" to a given task by + creating a checksum (or signature) of the task's inputs. + If the checksum changes, the system assumes the inputs have changed and the task needs to be + rerun. + For the second question, the shared state (sstate) code tracks which tasks add which output + to the build process. + This means the output from a given task can be removed, upgraded or otherwise manipulated. + The third question is partly addressed by the solution for the second question + assuming the build system can fetch the sstate objects from remote locations and + install them if they are deemed to be valid. + + + + The rest of this section goes into detail about the overall incremental build + architecture, the checksums (signatures), shared state, and some tips and tricks. + + +
+ Overall Architecture + + + When determining what parts of the system need to be built, the Yocto Project + uses a per-task basis and does not use a per-recipe basis. + You might wonder why using a per-task basis is preferred over a per-recipe basis. + To help explain, consider having the IPK packaging backend enabled and then switching to DEB. + In this case, do_install and do_package + output are still valid. + However, with a per-recipe approach, the build would not include the + .deb files. + Consequently, you would have to invalidate the whole build and rerun it. + Rerunning everything is not the best situation. + Also in this case, the core must be "taught" much about specific tasks. + This methodology does not scale well and does not allow users to easily add new tasks + in layers or as external recipes without touching the packaged-staging core. + +
+ +
+ Checksums (Signatures) + + + The Yocto Project uses a checksum, which is a unique signature of a task's + inputs, to determine if a task needs to be run again. + Because it is a change in a task's inputs that trigger a rerun, the process + needs to detect all the inputs to a given task. + For shell tasks, this turns out to be fairly easy because + the build process generates a "run" shell script for each task and + it is possible to create a checksum that gives you a good idea of when + the task's data changes. + + + + To complicate the problem, there are things that should not be included in + the checksum. + First, there is the actual specific build path of a given task - + the WORKDIR. + It does not matter if the working directory changes because it should not + affect the output for target packages. + Also, the build process has the objective of making native/cross packages relocatable. + The checksum therefore needs to exclude WORKDIR. + The simplistic approach for excluding the worknig directory is to set + WORKDIR to some fixed value and create the checksum + for the "run" script. + + + + Another problem results from the "run" scripts containing functions that + might or might not get called. + The Yocto Project contains code that figures out dependencies between shell + functions. + This code is used to prune the "run" scripts down to the minimum set, + thereby alleviating this problem and making the "run" scripts much more + readable as a bonus. + + + + So far we have solutions for shell scripts. + What about python tasks? + Handling these tasks are more difficult but the the same approach + applies. + The process needs to figure out what variables a python function accesses + and what functions it calls. + Again, the Yocto Project contains code that first figures out the variable and function + dependencies, and then creates a checksum for the data used as the input to + the task. + + + + Like the WORKDIR case, situations exist where dependencies + should be ignored. + For these cases, you can instruct the build process to ignore a dependency + by using a line like the following: + + PACKAGE_ARCHS[vardepsexclude] = "MACHINE" + + This example ensures that the PACKAGE_ARCHS variable does not + depend on the value of MACHINE, even if it does reference it. + + + + Equally, there are cases where we need to add in dependencies + BitBake is not able to find. + You can accomplish this by using a line like the following: + + PACKAGE_ARCHS[vardeps] = "MACHINE" + + This example explicitly adds the MACHINE variable as a + dependency for PACKAGE_ARCHS. + + + + Consider a case with inline python, for example, where BitBake is not + able to figure out dependencies. + When running in debug mode (i.e. using -DDD), BitBake + produces output when it discovers something for which it cannot figure out + dependencies. + The Yocto Project team has currently not managed to cover those dependencies + in detail and is aware of the need to fix this situation. + + + + Thus far, this section has limited discussion to the direct inputs into a + task. + Information based on direct inputs is referred to as the "basehash" in the code. + However, there is still the question of a task's indirect inputs, the things that + were already built and present in the build directory. + The checksum (or signature) for a particular task needs to add the hashes of all the + tasks the particular task depends upon. + Choosing which dependencies to add is a policy decision. + However, the effect is to generate a master checksum that combines the + basehash and the hashes of the task's dependencies. + + + + While figuring out the dependencies and creating these checksums is good, + what does the Yocto Project build system do with the checksum information? + The build system uses a signature handler that is responsible for + processing the checksum information. + By default, there is a dummy "noop" signature handler enabled in BitBake. + This means that behaviour is unchanged from previous versions. + OECore uses the "basic" signature handler through this setting in the + bitbake.conf file: + + BB_SIGNATURE_HANDLER ?= "basic" + + Also within the BitBake configuration file, we can give BitBake + some extra information to help it handle this information. + The following statements effectively result in a list of global + list of variable dependency excludes - variables never included in + any checksum: + + BB_HASHBASE_WHITELIST ?= "TMPDIR FILE PATH PWD BB_TASKHASH BBPATH" + BB_HASHBASE_WHITELIST += "DL_DIR SSTATE_DIR THISDIR FILESEXTRAPATHS" + BB_HASHBASE_WHITELIST += "FILE_DIRNAME HOME LOGNAME SHELL TERM USER" + BB_HASHBASE_WHITELIST += "FILESPATH USERNAME STAGING_DIR_HOST STAGING_DIR_TARGET" + BB_HASHTASK_WHITELIST += "(.*-cross$|.*-native$|.*-cross-initial$| \ + .*-cross-intermediate$|^virtual:native:.*|^virtual:nativesdk:.*)" + + This example is actually where WORKDIR + is excluded since WORKDIR is constructed as a + path within TMPDIR, which is on the whitelist. + + + + The BB_HASHTASK_WHITELIST covers dependent tasks and + excludes certain kinds of tasks from the dependency chains. + The effect of the previous example is to isolate the native, target, + and cross components. + So, for example, toolchain changes do not force a rebuild of the whole system. + + + + The end result of the "basic" handler is to make some dependency and + hash information available to the build. + This includes: + + BB_BASEHASH_task-<taskname> - the base hashes for each task in the recipe + BB_BASEHASH_<filename:taskname> - the base hashes for each dependent task + BBHASHDEPS_<filename:taskname> - The task dependencies for each task + BB_TASKHASH - the hash of the currently running task + + There is also a "basichash" BB_SIGNATURE_HANDLER, + which is the same as the basic version but adds the task hash to the stamp files. + This results in any metadata change that changes the task hash, + automatically causing the task to be run again. + This removes the need to bump PR + values and changes to metadata automatically ripple across the build. + Currently, this behavior is not the default behavior. + However, it is likely that the Yocto Project team will go forward with this + behavior in the future since all the functionality exists. + The reason for the delay is the potential impact to the distribution feed + creation as they need increasing PR fields + and the Yocto Project currently lacks a mechanism to automate incrementing + this field. + +
+ +
+ Shared State + + + Checksums and dependencies as discussed in the previous section solves half the + problem. + The other part of the problem is being able to use checksum information during the build + and being able to reuse or rebuild specific components. + + + + The shared state class (sstate.bbclass) + is a relatively generic implementation of how to + "capture" a snapshot of a given task. + The idea is that the build process does not care about the source of a + task's output. + Output could be freshly built or it could be downloaded and unpacked from + somewhere - the build process doesn't need to worry about its source. + + + + There are two types of output, one is just about creating a directory + in WORKDIR. + A good example is the output of either do_install or + do_package. + The other type of output occurs when a set of data is merged into a shared directory + tree such as the sysroot. + + + + The Yocto Project team has tried to keep the details of the implementation hidden in + sstate.bbclass. + From a user's perspective, adding shared state wrapping to a task + is as simple as this do_deploy example taken from + do_deploy.bbclass: + + DEPLOYDIR = "${WORKDIR}/deploy-${PN}" + SSTATETASKS += "do_deploy" + do_deploy[sstate-name] = "deploy" + do_deploy[sstate-inputdirs] = "${DEPLOYDIR}" + do_deploy[sstate-outputdirs] = "${DEPLOY_DIR_IMAGE}" + + python do_deploy_setscene () { + sstate_setscene(d) + } + addtask do_deploy_setscene + + In the example, we add some extra flags to the task, a name field ("deploy"), an + input directory where the task sends data, and the output + directory where the data from the task should eventually be copied. + We also add a _setscene variant of the task and add the task + name to the SSTATETASKS list. + + + + If you have a directory whose contents you need to preserve, + you can do this with a line like the following: + + do_package[sstate-plaindirs] = "${PKGD} ${PKGDEST}" + + This method, as well as the following example, also works for mutliple directories. + + do_package[sstate-inputdirs] = "${PKGDESTWORK} ${SHLIBSWORKDIR}" + do_package[sstate-outputdirs] = "${PKGDATA_DIR} ${SHLIBSDIR}" + do_package[sstate-lockfile] = "${PACKAGELOCK}" + + These methods also include the ability to take a lockfile when manipulating + shared state directory structures since some cases are sensitive to file + additions or removals. + + + + Behind the scenes, the shared state code works by looking in + SSTATE_DIR and + SSTATE_MIRRORS for shared state files. + Here is an example: + + SSTATE_MIRRORS ?= "\ + file://.* http://someserver.tld/share/sstate/ \n \ + file://.* file:///some/local/dir/sstate/" + + + + + The shared state package validity can be detected just by looking at the + filename since the filename contains the task checksum (or signature) as + described earlier in this section. + If a valid shared state package is found, the build process downloads it + and uses it to accelerate the task. + + + + The build processes uses the *_setscene tasks + for the task acceleration phase. + BitBake goes through this phase before the main execution code and tries + to accelerate any tasks for which it can find shared state packages. + If a shared state package for a task is available, the shared state + package is used. + This means the task and any tasks on which it is dependent are not + executed. + + + + As a real world example, the aim is when building an IPK-based image, + only the do_package_write_ipk tasks would have their + shared state packages fetched and extracted. + Since the sysroot is not used, it would never get extracted. + This is another reason to prefer the task-based approach over a + recipe-based approach, which would have to install the output from every task. + +
+ +
+ Tips and Tricks + + + The code in the Yocto Project that supports incremental builds is not + simple code. + Consequently, when things go wrong, debugging needs to be straightforward. + Because of this, the Yocto Project team included strong debugging + tools. + + + + First, whenever a shared state package is written out, so is a + corresponding .siginfo file. + This practice results in a pickled python database of all + the metadata that went into creating the hash for a given shared state + package. + + + + Second, if BitBake is run with the --dump-signatures + (or -S) option, BitBake dumps out + .siginfo files in + the stamp directory for every task it would have executed instead of + building the target package specified. + + + + Finally, there is a bitbake-diffsigs command that + can process these .siginfo files. + If one file is specified, it will dump out the dependency + information in the file. + If two files are specified, it will compare the + two files and dump out the differences between the two. + This allows the question of "What changed between X and Y?" to be + answered easily. + +
+ + +