Complicating Things With XOR Linked Lists

If you’re anything like me and find endless amusement in the old Obfuscated Perl Contests, you’ll enjoy this post. As a hacker, there’s nothing quite as gratifying as observing playful cleverness in someone else’s or your own code. It’s that moment when you think, “Ah…I see what you did there Mr. Sneaky Pants…very ingenious” which is usually followed by, “Let me see if I can do better.”

I’ve recently discovered a clever trick called the “XOR Linked List”. Apparently, it’s been in use for quite some time yet somehow I’d never heard about it until now. As Bender Bending Rodriquez so eloquently put it, “You learn something dumb every day.”

As you know, a doubly linked list is a set of sequentially linked nodes. Each node contains two fields; one references the previous node and the other references the next node. On the other hand, an XOR linked list combines both the “next” and “previous” fields into a single field by taking advantage of the properties of the XOR operator. It does this by storing the XOR of the previous node and the next node. For the sake of this article, I’ll call this the “link field.”

So instead of your traditional doubly linked list like this:

Doubly Linked List

A traditional doubly linked list.

You end up with something looking like this:

XOR Linked List

An XOR Linked List.

To traverse the list, you simply XOR the link field with the address of the opposite node you want to access. For example, if you were traversing left to right at node B (whose link field contains A ⊕ C) and wanted to access the next node C, you would take the previous node’s address (which is A) and XOR it with the current link field (A ⊕ C), giving you the address of C. Logically, this would look like C = A ⊕ (A ⊕ C).

To understand why this works, let’s look at the properties of the XOR operator.

Property Formula Result
Commutativity A ⊕ B B ⊕ A
Associativity A ⊕ (B ⊕ C) (A ⊕ B) ⊕ C

In addition to these two properties, the following basic identities can be proven:

Identity Result
A ⊕ 0 A
A ⊕ A 0
(A ⊕ B) ⊕ A B

There are several other identities but these are the only ones relevant to the XOR linked list. The last identity is exactly what I explained before. It can be a little tricky so I’ll break it down a bit with a proof.

As you know, (A ⊕ B) ⊕ A = B and because of the Associative Property, this could have equally been written as A ⊕ (B ⊕ A). Removing the redundant parenthesis and applying the Commutative Property, we can rewrite this as A ⊕ A ⊕ B. The second identity states that A ⊕ A = 0, so the expression can be reduced to 0 ⊕ B. Finally, using the first identity, this can be further be reduced to just B.

In terms of the linked list, all of this could be expressed as:

Field Value
Link Next ⊕ Previous
Next Link ⊕ Previous
Previous Link ⊕ Next

Instead of a bunch of theory, let’s look at an example. Despite my extreme hatred of C++, I’m forced to use it here since, as you’ll soon learn, this hack is not suitable for garbage collected languages.

Create the file xor_linked_list.h like so:

// xor_linked_list.h

#include <iostream>

struct XorNode {
    int id;
    XorNode *link;
};

class XorLinkedList {
  public:
    XorNode *head;
    XorNode *tail;

    XorNode *nextNode(XorNode *node, XorNode *prevNode) {
        return ((XorNode *) ((int) node->link ^ (int) prevNode));
    }

    void deleteList(void) {
        XorNode *prev, *current;

        prev    = NULL;
        current = head;

        while (current) {
            std::cout << "Node removed: " << current->id << std::endl;
            current->link = nextNode(current, prev);

            if (prev)
                delete prev;

            if (!current->link) {
                delete current;
                current = NULL;

                continue;
            }

            prev    = current;
            current = current->link;
        }

        std::cout << std::endl;
    }

    void insertAfter(XorNode *newNode, int id) {
        XorNode *prev, *current, *next;

        prev    = NULL;
        current = head;

        while (current) {
            next = nextNode(current, prev);

            if (current->id == id) {
                // Fix the current "next" node
                if (next)
                    next->link = (XorNode *) ((int) next->link
                                            ^ (int) current
                                            ^ (int) newNode);

                // Fix current node
                current->link = (XorNode *) ((int) newNode
                                           ^ (int) next
                                           ^ (int) current->link);

                // Fix new node
                newNode->link = (XorNode *) ((int) current ^ (int) next);

                break;
            }

            prev    = current;
            current = next;
        }
    }

    void traverse(XorNode *root) {
        XorNode *current, *prev, *next;

        prev    = NULL;
        current = root;

        while (current) {
            std::cout << "Node found: " << current->id << std::endl;

            next    = nextNode(current, prev);
            prev    = current;
            current = next;
        }

        std::cout << std::endl;
    }

    void insert(int id) {
        XorNode *newNode = new XorNode;

        if (!newNode) {
            std::cerr << "[ERROR] Failed to insert new node." << std::endl;
            return;
        }

        newNode->id = id;
        newNode->link = NULL;

        std::cout << "Node added: " << newNode->id << std::endl;

        if (!head)
            head = tail = newNode;
        else {
            insertAfter(newNode, tail->id);
            tail = newNode;
        }

        return;
    }
};

Then create main.cpp:

// main.cpp

#include <iostream>
#include <cstdlib>
#include "xor_linked_list.h"

int main(int argc, char *argv[]) {
    XorLinkedList *list = new XorLinkedList;

    int nodeCount;

    if (argc < 2) {
        std::cerr << "Usage: " << argv[0] <<  " <nodes>" << std::endl;
        return 1;
    }

    nodeCount = atoi(argv[1]);

    std::cout << "# Adding " << nodeCount << " nodes to list" << std::endl;

    for (int i = 0; i < nodeCount; i++)
        list->insert(i);

    std::cout << std::endl;

    std::cout << "# Forward traversal" << std::endl;
    list->traverse(list->head);

    std::cout << "# Backward traversal" << std::endl;
    list->traverse(list->tail);

    std::cout << "# Removing nodes from list" << std::endl;
    list->deleteList();

    delete list;
    list = NULL;

    return 0;
}

If you’ve taken the standard Data Structures and Algorithms class, you’ll recognize most of this as your classic textbook “let’s learn about linked lists” example. When the list has no nodes, head and tail are both 0. When only one node is present, head is the node while tail is still 0. Under the last condition when there are two or more nodes in the list, head and tail point to the first and last nodes respectively. The only thing different about them is how the nodes are referenced. You can see this in the  nextNode() method:

return ((XorNode *) ((int) node->link ^ (int) prevNode));

Put more simply in pseudo-code, this translates to:

next node = link field XOR previous node

Which, again, reduces to:

next node = previous node XOR next node XOR previous node

This is similar to what we saw before with C = A ⊕ (A ⊕ C).

Compile and run the program, passing the number of nodes as an argument:

[foo@bar ~]$ g++ -Wall -o xorll main.cpp
[foo@bar ~]$ ./xorll 4
# Adding 4 nodes to list
Node added: 0
Node added: 1
Node added: 2
Node added: 3

# Forward traversal
Node found: 0
Node found: 1
Node found: 2
Node found: 3

# Backward traversal
Node found: 3
Node found: 2
Node found: 1
Node found: 0

# Removing nodes from list
Node removed: 0
Node removed: 1
Node removed: 2
Node removed: 3

Even though XOR linked lists are quite clever, they are just that. Nothing more. You won’t see this type of playfulness in production-ready code. For one thing, it adds to the complexity of the code. Its purpose is not readily apparent and makes maintenance more expensive. Also, in order to calculate the address of the next node you need to remember the address of the node that was last accessed while traversing the list, adding further complexity.

Another reason it is not practical, as I mentioned earlier, is due to garbage collection. Most garbage collectors will fail to work properly with classes or structures that don’t contain literal pointers. This also restricts it to low-level languages that use pointers like C/C++. I would have loved to use a Java example; however, Java is a garbage collected language and has no notion of pointers. Imagine the case where a program wrote those pointers to a file and read them back later on. The mangled addresses referenced by those pointers are likely to have been recycled, causing major borkage.

The only slightly practical application of XOR linked lists are in environments with limited space requirements, such as embedded devices. It effectively cuts the allocated memory for a linked list in half by 50% at the expense of a few extra machine cycles. Even then, using an unrolled linked list would be much more practical.

Despite all these disadvantages, I still think XOR linked lists are really clever and ingenious. They add an eloquent twist on an otherwise mundane task. It’s through these silly things like XOR linked lists and sleep sort and self-interpreted brainfuck and obfuscated Perl that the innovative nature of the hacker really shines.

About these ads

Comments are closed.