You are here

  1. Home
  2. » Experiments in Cosmology
  3. » Using offsetof()

Searching and Sorting with offsetof()

Submitted by cwestin on
Searching

Suppose I'm writing an application to manage schedules at a university. I might wind up with some declarations and code similar to searchsort.cpp on github (open the example in another window to follow along).

The enrolled array (line 27) is sorted by Student.key. This allows me to search for a student using the standard library's bsearch(), as shown on lines 89-93.

I can perform a similar search for a Professor (defined on lines 51-57), as shown on lines 103-11.

bsearch() requires the use of a comparator function to compare elements in the array that it is searching. In this example, the comparators are compareStudent() (lines 35-47), and compareProfessor (lines 67-79).

These two comparators are nearly identical, textually, but I have to have them both, or else I'll get an incorrect result. Why? They are both comparing an unsigned key, so why do I need more than one of them?

The keys in the two structures are located at different positions within the structures. When the comparators are compiled, those structure member offsets get encoded into machine language instructions that refer to those key fields at those different offsets. Even though the textual representation of the comparators is pretty much the same, their compiled forms vary because of the member offsets that are implicitly required by the structure member dereferences.

If we could provide a means to find the sort key, then we might be able to share a common comparator. offsetof() provides a means to do this.

I've written a version of bsearch() that takes an explicit offset parameter which is used to find the comparison keys before calling the comparator on them:

bsearch.h contains both an untyped version based on void *, as well as a typed version that relies on a template for the type of the array elements, the type of the search key, and the offset of the search key within an array element.

I've also provided a (templated) set of common functions for comparing scalar types:

  • compare.h
  • compare.cpp
  • Using these is easy, as shown in the original searchsort.cpp on lines 118 (finding a Student), and 131 (finding a Professor). In the first case, I specified the common comparator explicitly, but in the second, I let the bsearch<>() template supply the appropriate one. Because the offset is provided explicitly, we only need one comparison function for both cases, even though the structure members are at different offsets.

    Another benefit to this is that I no longer have to have a dummy copy of the structure to hold the key as on lines 90 and 103. For the standard version of bsearch() we don't know which argument to the comparator will be the key, and which will be an array element (with the key inside), so we have to write the comparator so it will work with either as either parameter. We have to create a dummy structure as shown here, and populate the key field with the search key value. With the offset-based version of bsearch(), we provide a pointer to the key value, and the offset to find the keys within the array elements, so this is no longer necessary: the comparator will always be called on pointers to keys.

    Using this approach means that a program would require a maximum of one comparison routine for each scalar datatype the programming language supports, rather than one per searched structure type. (More comparison routines may be required if data structures use a concatenated key made up of multiple structure members.) The scalar comparison functions can be supplied as part of a standard library, as is done here.

    Sorting

    The same technique can be applied to write a version of qsort() which can also share the same comparison functions. I've implemented that here:

    Further Improvements

    As nice as this scheme is, it still has issues. In use, a programmmer could accidentally mis-state the type of the structure member. It's also possible to get the offset of the structure member wrong. Ideally, we'd like to see a system where the structure member only needs to be specified once, and that can be used to derive it's various attributes. Looking again at the templated version of bsearch<>() in bsearch.h (lines 88-100), imagine if we could instead express that in this way:

    
    template<class T, variable T::V>
    inline const T *bsearch(const typeof(V) *pKey, const T *pArray, size_t n,
    			int (*cmp)(const typeof(V) *pl, const typeof(V) *pr))
    {
        return (const T *)bsearch(
    	(const void *)pKey, (const void *)pArray, n, sizeof(T),
            offsetof(T, V), (int (*)(const void *, const void *))cmp);
    }
    
    

    I've invented a couple of language features here:

    • variable: a new template declaration mechanism for specifying a variable instead of a type or value
    • typeof(): a compile-time reflection facility providing the type of a variable
    If I had these, I could eliminate the two sources of potential errors described above. I can now specify that the type must be that of V, and this in turn will enforce the use of an appropriate comparator. I can also compute the offset via from the same variable. Now everything must match up, and it is not left to the programmer to get the type and offsetof() expression correct for the bsearch<>() invocation.

    Tags: