: Sources moved to
verbose flag added to picking script; HTML listing script includes stack
weights and probabilities, and indicates whether book is owned on
This is an updated version of a "geekery" post from
year. I've made
substantial changes to the script it describes since then. I'm leaving the former
(and much simpler) version in place, but also wanted to show off my new
But one thing hasn't changed at all:
it's another example of the mental aberration that causes me to write Perl
scripts to solve life's little everyday irritants. In this case two
noticed that I had a lot of books on my shelves, acquired long past,
that I never got around to reading. Either because (a) they were
dauntingly long and dense (I'm thinking about Infinite Jest by David Foster
Wallace); or because (b) they just fell through the cracks. Both poor
excuses, but there you are.
- I sometimes want to methodically
read a series of books in a particular order.
In other words, I needed a way to
bring diligence and organization to my previous chaotic and sloppy
I think of what I came up with as the "To-Be-Read" (hereafter TBR)
database. That's a slightly lofty title, but anyway:
The high-level view: all the TBR books are in zero or more stacks,
each stack containing zero or more titles.
Each stack is maintained in the order I want to read the books therein.
(This goes back to the issue mentioned above: sometimes a series really "should"
be read in publishing order, for example C.J. Box's novels featuring
protagonist Joe Pickett.)
So picking a book to read involves (a) choosing an "eligible" stack; and (b) "popping"
the top book from the chosen stack. Very computer science-y.
The interesting part is the "choosing an eligible stack" step. There are a number
of possible ways to do it. But first, more details on "eligibility".
The major problem with the previous version of this script was that too
often it would pick a book "too soon" after I'd read something off the
same stack. (An issue mentioned in last year's post.) As it turns out,
to let some time go by between picks from the same stack. (For example,
at least 30 days between books by Heinlein. Too much of a good thing,
in this version,
each stack has an "age": the time that's elapsed since
I previously picked a book from that stack. And a "minimum age",
the amount of time that must elapse after a pick before that stack becomes eligible
Another minor difference: I don't actually own some of the books in some of
the stacks yet.
I want to read them someday. But I'm waiting, typically for the price to
come down, either via the Barnes & Noble remainder table or the
Amazon used market. I'm RetiredOnAFixedIncome, after all.
So an eligible stack is one that:
- is non-empty;
- the top book is owned;
- the stack is older than its specified minimum age.
OK, so how do we choose among eligible stacks? Possibilities:
Pick the "oldest" stack; the one for which it's been the longest time since
a book from it was previously picked.
Pick the highest stack, the one with the most titles therein. (Because it
needs the most work, I guess.)
Just pick a stack at random.
Pick a random stack weighted by stack height. That is, any stack
picked, but one with eight titles in it is twice as likely to be picked
as one with four titles. (This was the algorithm used in the previous
Pick a random stack, weighted by age. That is, a stack that's 90 days
old is twice as likely to be picked as a 45-day old one.
But what I'm doing is a combination of the last two: the stack-weighting
function is the stack height times the stack age. So (for example) a
120-day-old stack with 5 titles is twice as likely to be picked as a
50-day-old stack with 6 titles. Because 120 * 5 = 600 and 50 * 6 = 300. This is totally
arbitrary, but it seems to work for me so far.
Here's my current take
on scripting that.
Each stack is implemented as a comma-separated values (CSV) file,
line per book, each line containing two fields:
- The book title;
- Whether I own the book yet (1/0 = yes/no).
For example, here's the current content of
containing the to-be-read books of Christopher Moore:
"The Serpent of Venice",1
I.e., three books, the first two owned, the third one, Noir,
unpurchased as yet. (I'll get it someday, and edit the file to change
the 0 to 1.)
[Added: in addition to 0/1, 'K' indicates that the book's Kindle version
is owned. This is just a convenience in case I go looking for it long
after actually buying it.]
There is a "master" CSV file,
stacks.csv. It has a header
(for some reason that I forget). Each non-header line contains data for
a single stack:
- The (nice human-readable) stack name;
- The stack ID (corresponding to the name of the stack file);
- The minimum time, in days, that should elapse between consecutive
picks from that stack;
- The date when a book was most recently picked from the stack.
As I type, here's what it looks like:
"Chronicles of Amber",amber,42,2018-04-15
"Conservative Lit 101",conservative_lit_101,60,2017-09-07
"Robert A. Heinlein",heinlein,30,2018-06-19
No comments from the peanut gallery about my lack of literary taste,
Picking a random stack according to a weighting function isn't hard.
I'd pseudocode the algorithm like this:
eligible stacks (indexed 0..N
being the calculated weight of the i
(assumed integer) …
Let T be the total weight,
W0 + W1 + ⋯ + WN-1
Pick a random number r between 0 and T-1.
p = 0
while (r >= Wp)
r -= Wp
… and on loop exit p will index the list picked.
To anticipate CS pedants: I know this is
O(N) and using a binary search instead could make it O(log N).
In practice, it's plenty fast enough. And other steps in the process
are O(N) anyway.
The "picking" script,
bookpicker, is here.
-v "verbose" flag will output a list of each stack's
Text::CSV Perl module is used for reading/writing CSV
modules are invaluable for doing the simple age calculations and
- You just run the script with no arguments or options; output is the
title and the name of the picked list.
- The user is responsible for maintaining the CSV files; no blank/duplicate
lines, etc. I use My Favorite Editor (vim), but CSVs are also editable
with Your Favorite Spreadsheet.
- For the "picked" stack, the script writes a smaller file with the
picked title missing. The old stack is saved with a
appended to the name. The
stacks.csv file is also updated
appropriately with today's date for the last-picked field for the picked
The weighting function and random number generation are constrained to
integer values; I think it would work without that, but who wants to
about rounding errors? Not I.
I also have a couple scripts to list out the contents of the to-be-read
A script that produces plain text output (on stdout) is here.
A script that produces an HTML page and displays it in my browser
(Google Chrome) is
It uses text color to signify eligible/ineligible stacks and
Sample output (again, comments on my literary taste, or lack thereof, are
HTML::Template module is used to make output generation
easier, and the template used for that is
Getting it to show up in my browser is accomplished via
server/client/extension; if you don't have it, it's pretty easy to do
something else instead.
Whew! I feel better getting this off my chest..