Stop writing Rust linked list libraries!
Nov. 12th, 2022 03:00 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
tl;dr:
Don’t write a Rust linked list library: they are hard to do well, and usually useless.
Use VecDeque
, which is great. If you actually need more than VecDeque
can do, use one of the handful of libraries that actually offer a significantly more useful API.
If you are writing your own data structure, check if someone has done it already, and consider slotmap
or generation_arena
, (or maybe Rc
/Arc
).
Contents
- Survey of Rust linked list libraries
- Why are there so many poor Rust linked list libraries ?
- Rustic approaches to pointers-to-and-between-nodes data structures
- Rust’s package ecosystem demonstrating software’s NIH problem
Survey of Rust linked list libraries
I have updated my Survey of Rust linked list libraries.
Background
In 2019 I was writing plag-mangler, a tool for planar graph layout.
I needed a data structure. Naturally I looked for a library to help. I didn’t find what I needed, so I wrote rc-dlist-deque. However, on the way I noticed an inordinate number of linked list libraries written in Rust. Most all of these had no real reason for existing. Even the one in the Rust standard library is useless.
Results
Now I have redone the survey. The results are depressing. In 2019 there were 5 libraries which, in my opinion, were largely useless. In late 2022 there are now thirteen linked list libraries that ought probably not ever to be used. And, a further eight libraries for which there are strictly superior alternatives. Many of these have the signs of projects whose authors are otherwise competent: proper documentation, extensive APIs, and so on.
There is one new library which is better for some applications than those available in 2019. (I’m referring to generational_token_list
, which makes a plausible alternative to dlv-list
which I already recommended in 2019.)
Why are there so many poor Rust linked list libraries ?
Linked lists and Rust do not go well together. But (and I’m guessing here) I presume many people are taught in programming school that a linked list is a fundamental data structure; people are often even asked to write one as a teaching exercise. This is a bad idea in Rust. Or maybe they’ve heard that writing linked lists in Rust is hard and want to prove they can do it.
Double-ended queues
One of the main applications for a linked list in a language like C, is a queue, where you put items in at one end, and take them out at the other. The Rust standard library has a data structure for that, VecDeque
.
Five of the available libraries:
- Have an API which is a subset of that of
VecDeque
: basically, pushing and popping elements at the front and back. - Have worse performance for most applications than
VecDeque
, - Are less mature, less available, less well tested, etc., than
VecDeque
, simply becauseVecDeque
is in the Rust Standard Library.
For these you could, and should, just use VecDeque
instead.
The Cursor
concept
A proper linked list lets you identify and hold onto an element in the middle of the list, and cheaply insert and remove elements there.
Rust’s ownership and borrowing rules make this awkward. One idea that people have many times reinvented and reimplemented, is to have a Cursor
type, derived from the list, which is a reference to an element, and permits insertion and removal there.
Eight libraries have implemented this in the obvious way. However, there is a serious API limitation:
To prevent a cursor being invalidated (e.g. by deletion of the entry it points to) you can’t modify the list while the cursor exists. You can only have one cursor (that can be used for modification) at a time.
The practical effect of this is that you cannot retain cursors. You can make and use such a cursor for a particular operation, but you must dispose of it soon. Attempts to do otherwise will see you losing a battle with the borrow checker.
If that’s good enough, then you could just use a VecDeque
and use array indices instead of the cursors. It’s true that deleting or adding elements in the middle involves a lot of copying, but your algorithm is O(n) even with the single-cursor list libraries, because it must first walk the cursor to the desired element.
Formally, I believe any algorithm using these exclusive cursors can be rewritten, in an obvious way, to simply iterate and/or copy from the start or end (as one can do with VecDeque
) without changing the headline O() performance characteristics.
IMO the savings available from avoiding extra copies etc. are not worth the additional dependency, unsafe code, and so on, especially as there are other ways of helping with that (e.g. boxing the individual elements).
Even if you don’t find that convincing, generational_token_list
and dlv_list
are strictly superior since they offer a more flexible and convenient API and better performance, and rely on much less unsafe code.
Rustic approaches to pointers-to-and-between-nodes data structures
Most of the time a VecDeque
is great. But if you actually want to hold onto (perhaps many) references to the middle of the list, and later modify it through those references, you do need something more. This is a specific case of a general class of problems where the naive approach (use Rust references to the data structure nodes) doesn’t work well.
But there is a good solution:
Keep all the nodes in an array (a Vec<Option<T>>
or similar) and use the index in the array as your node reference. This is fast, and quite ergonomic, and neatly solves most of the problems. If you are concerned that bare indices might cause confusion, as newly inserted elements would reuse indices, add a per-index generation count.
These approaches have been neatly packaged up in libraries like slab
, slotmap
, generational-arena
and thunderdome
. And they have been nicely applied to linked lists by the authors of generational_token_list
. and dlv-list
.
The alternative for nodey data structures in safe Rust: Rc
/Arc
Of course, you can just use Rust’s “interior mutability” and reference counting smart pointers, to directly implement the data structure of your choice.
In many applications, a single-threaded data structure is fine, in which case Rc
and Cell
/RefCell
will let you write safe code, with cheap refcount updates and runtime checks inserted to defend against unexpected aliasing, use-after-free, etc.
I took this approach in rc-dlist-deque
, because I wanted each node to be able to be on multiple lists.
Rust’s package ecosystem demonstrating software’s NIH problem
The Rust ecosystem is full of NIH libraries of all kinds. In my survey, there are: five good options; seven libraries which are plausible, but just not as good as the alternatives; and fourteen others.
There is a whole rant I could have about how the whole software and computing community is pathologically neophilic. Often we seem to actively resist reusing ideas, let alone code; and are ignorant and dismissive of what has gone before. As a result, we keep solving the same problems, badly - making the same mistakes over and over again. In some subfields, working software, or nearly working software, is frequently replaced with something worse, maybe more than once.
One aspect of this is a massive cultural bias towards rewriting rather than reusing, let alone fixing and using.
Many people can come out of a degree, trained to be a programmer, and have no formal training in selecting and evaluating software; this is even though working effectively with computers requires making good use of everyone else’s work.
If one isn’t taught these skills (when and how to search for prior art, how to choose between dependencies, and so on) one must learn it on the job. The result is usually an ad-hoc and unsystematic approach, often dominated by fashion rather than engineering.
The package naming paradox
The more experienced and competent programmer is aware of all the other options that exist - after all they have evaluated other choices before writing their own library.
So they will call their library something like generational_token_list
or vecdeque-stableix
.
Whereas the novice straight out of a pre-Rust programming course just thinks what they are doing is the one and only obvious thing (even though it’s a poor idea) and hasn’t even searched for a previous implementation. So they call their package something obvious like “linked list”.
As a result, the most obvious names seem to refer to the least useful libraries.
Edited 2022-11-16 23:55 UTC to update numbers of libraries in various categories following updates to the survey (including updates prompted by feedback received after this post first published).