Sunday, May 15, 2011

How to learn a code base - rsync Part 1

In the Prologue of this series we took a birds eye view on rsync based on the information we found on Wikipedia and the project homepage. The goal was to achieve a general feel for the project and do the important first step in learning a code base - simply starting.
Today we will start navigating the code base. Getting our own copy and compiling it.



Table of Contents

Part 1 - Getting the source and compiling it


Information on how to get the source is one of the most important things for an open source project. The authors didn't fail and provided a great description of several available ways of getting the code. For our needs a checkout of the git repository will serve the best so let's clone a local copy for ourselves.

git clone git://git.samba.org/rsync.git

The result of this command will be a 28M rsync directory containing the whole history of the project.
The first things we look for are files named README, INSTALL, NEWS, TODO - generally anything all caps that could contain notes from the developers to fellow developers.

Let's start with the README file. The first part repeats some informations we found previously. In the SETUP section we see that rsync doesn't require setuid or any special privileges - this is a nice usability/security feature.

And bingo, the first really important part of information that tells us how to work with the raw code. The file informs us that in order to install rsync, we need to first run the "configure" script. Then type "make" to build the code. The final installation step is just copying the binary on the system path - which can be also done using the make install command. This is a standard sequence with most C projects in the UNIX world.

Let's also note another important thing from the README file:

Note that on some systems you will have to force configure not to use
gcc because gcc may not support some features (such as 64 bit file
offsets) that your system may support.  Set the environment variable CC
to the name of your native compiler before running configure in this
case.


That covers the README file at this point. Next stop the INSTALL file.
As expected, the installation steps are repeated with an additional pointer showing how to see the optional arguments to the configuration script. It's worth to note that rsync tries to run as the nobody user group when in daemon mode. This default can be changed by editing NOBODY_USER and NOBODY_GROUP in the config.h file or overriding settings in the rsync daemon configuration file (/etc/rsyncd.conf).

There is no general list of dependencies the tool has, so we will start maintaining our own. The first library on the dependency list is the popt option-parsing library required since rsync 2.4.7. It's used for parsing the configuration file for the daemon mode. A recent copy of the library is included in the rsync code base and is used by default if the hosting system doesn't have a copy installed. One can force the usage of the bundled version by passing --with-included-popt option to the ./configure script.

The file also contains useful information on a rare problem with the make system and several sections devoted to specific platforms and packaging options. We will skip this info for now and stick to the basic installation steps.

While reviewing the files we notice the 'Doxyfile' configuration file for the doxygen documentation generation utility. This is great news as it means that probably most of the code base will be covered by the tool and the documentation should be decent enough to be of use.

I just took a quick look on the README, OLDNEWS and NEWS files. Not searching for anything specific at this point - just getting the feel of the changes and plans for the code. The TODO file has a section on using a generic zlib. From it we can deduce that the zlib shipped with rsync is specific (patched) enough that it is a required bundled dependency.

There is one thing missing in the source tree or I didn't find it yet - a one-line description of the purpose and content of each file in the tree. Having a list like this often becomes the most useful part of the documentation of a system. It's terse enough to quickly glance and pick specific parts that might be relevant to the current task at hand. Since we didn't find one - we will start making our own and slowly fill it out with information.

I used the tree utility to generate the initial list, place it under version control and work on it from there. You can track the progress on github.

We can proceed to building the source since we gathered the initial required information.

Let's run ./configure without any parameters at first. In my case, the script reported a successful configuration so let's proceed to running make.

Unfortunately we got our first problem.

./rsync.h:669:26: fatal error: linux/falloc.h: No such file or directory

A quick search revealed that falloc is a linux header file providing functions for direct manipulation of file space. The file seems not to be present on my test system - a Slackware GNU/Linux running a grsec kernel 2.6.32.7 - at least not in /usr/include/linux/falloc.h.

The include comes from rsync.h on line 669 and is guarded by conditional compilation. The file is only included if the configuration utility set the HAVE_FALLOCATE or HAVE_SYS_FALLOCATE constants to a true value.

We need to check the config.log to see what made the system think the library is available.


configure.sh:7521: checking for posix_fallocate
configure.sh:7521: gcc -std=gnu99 -o conftest -g -O2 -DHAVE_CONFIG_H -Wall -W   conftest.c  >&5
configure.sh:7521: $? = 0
configure.sh:7521: result: yes
configure.sh:7541: checking for useable fallocate
configure.sh:7559: gcc -std=gnu99 -o conftest -g -O2 -DHAVE_CONFIG_H -Wall -W   conftest.c  >&5
configure.sh:7559: $? = 0
configure.sh:7567: result: yes


From the listing above we can see that a set of tests decided that POSIX fallocate is available. A further set of tests shows that the SYS fallocate is not available and not usable. Hence we need to see what made the POSIX one pass the configuration point and fail when compiling the code.

In the generated config.h file (created by the ./configure step) we can confirm the output of the config.log by seeing that HAVE_FALLOCATE is defined and set to 1. The same can be said about HAVE_POSIX_FALLOCATE. According to our expectations HAVE_SYS_FALLOCATE remains undefined.

Let's check configure.sh to see what code was generated that run correctly and how it differs to rsync.h


#include <fcntl.h>
#include <sys/types.h>
int
main ()
{
fallocate(0, 0, 0, 0);
  ;
  return 0;
}


Let's save this to test.c, compile and run


mulander@bunkier_mysli:~/code/blog/lac$ gcc -std=gnu99 -o conftest -g -O2 -DHAVE_CONFIG_H -Wall -W test.c

test.c: In function 'main':
test.c:6:3: warning: implicit declaration of function 'fallocate'
mulander@bunkier_mysli:~/code/blog/lac$ ./conftest
mulander@bunkier_mysli:~/code/blog/lac$ echo $?
0


The echo $? command shows the return code of the last run program. In our case it was 0 indicating that the program executed correctly. We did get a warning about implicitly declaring fallocate in this test but the result was not a failure. So what's different in rsync.h?

#if defined HAVE_FALLOCATE || HAVE_SYS_FALLOCATE
#include <linux/falloc.h>



Here we see that linux/falloc.h is included while in the test case we only had fcntl.h and sys/types.h. Adding #include <linux/falloc.h> to our test.c file will result in a fatal error on compilation (at least on my system) stating that such file or directory does not exist.

I did find a man page on the web for fallocate(2) and it states that linux/falloc.h is in fact the correct header to include. The only thing left to do is to check our local system man pages. Use the command man 2 fallocate.


SYNOPSIS
       #define _GNU_SOURCE             /* See feature_test_macros(7) */
       #include <fcntl.h>

       int fallocate(int fd, int mode, off_t offset, off_t len);

The synopsis section of the man page confirms that in case of our system - fnctl.h is the place where the fallocate function lives. Also adding the #define _GNU_SOURCE line to our test.c file squelches the warning about implicit fallocate declaration so we can confirm that this approach is correct in our case.

That particular change to rsync.h was made on 2011-04-05 - not that long ago. We can even browse the changes online. Since at this point we are not sure how to correctly handle this problem, we checked the official bug tracker and asked around on the unofficial irc channel. Gaining no information about this problem being known - we reported it to the official mailing list in the hope that a maintainer will resolve it appropriately. You can view our report here. As a workaround for now, we will remove the include and replace it with #include <fcntl.h> without adding the define for _GNU_SOURCE constant because it's already present in config.h.

Running make again with our change passed the compilation step without any reported problems. Since we passed the build and rsync --version didn't crash in a horrible way we've sent our patch to the mailing list.


mulander@bunkier_mysli:~/code/blog/lac/rsync$ ./rsync --version
rsync  version 3.1.0dev  protocol version 31.PR13
Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
    socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace,
    append, ACLs, xattrs, iconv, symtimes, prealloc

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.


At this point, we have a locally built copy of rsync which we will use from now on for our further exploration of rsync. Stay tuned ;)