[
Update 2019-11-11: Sources moved to
GitHub;
verbose flag added to picking script; HTML listing script includes stack
weights and probabilities, and indicates whether book is owned on
Kindle.]
This is an updated version of a "geekery" post from
last
year. I've made
substantial changes to the script it describes since then. I'm leaving the former
(and much simpler) version in place, but also wanted to show off my new
version.
But one thing hasn't changed at all:
it's another example of the mental aberration that causes me to write Perl
scripts to solve life's little everyday irritants. In this case two
little irritants:
- I
noticed that I had a lot of books on my shelves, acquired long past,
that I never got around to reading. Either because (a) they were
dauntingly long and dense (I'm thinking about Infinite Jest by David Foster
Wallace); or because (b) they just fell through the cracks. Both poor
excuses, but there you are.
- I sometimes want to methodically
read a series of books in a particular order.
In other words, I needed a way to
bring diligence and organization to my previous chaotic and sloppy
reading habits.
I think of what I came up with as the "To-Be-Read" (hereafter TBR)
database. That's a slightly lofty title, but anyway:
The high-level view: all the TBR books are in zero or more stacks,
each stack containing zero or more titles.
Each stack is maintained in the order I want to read the books therein.
(This goes back to the issue mentioned above: sometimes a series really "should"
be read in publishing order, for example C.J. Box's novels featuring
protagonist Joe Pickett.)
So picking a book to read involves (a) choosing an "eligible" stack; and (b) "popping"
the top book from the chosen stack. Very computer science-y.
The interesting part is the "choosing an eligible stack" step. There are a number
of possible ways to do it. But first, more details on "eligibility".
The major problem with the previous version of this script was that too
often it would pick a book "too soon" after I'd read something off the
same stack. (An issue mentioned in last year's post.) As it turns out,
I wanted
to let some time go by between picks from the same stack. (For example,
at least 30 days between books by Heinlein. Too much of a good thing,
too soon…)
So:
in this version,
each stack has an "age": the time that's elapsed since
I previously picked a book from that stack. And a "minimum age",
the amount of time that must elapse after a pick before that stack becomes eligible
again.
Another minor difference: I don't actually own some of the books in some of
the stacks yet.
I want to read them someday. But I'm waiting, typically for the price to
come down, either via the Barnes & Noble remainder table or the
Amazon used market. I'm RetiredOnAFixedIncome, after all.
So an eligible stack is one that:
- is non-empty;
- the top book is owned;
- the stack is older than its specified minimum age.
OK, so how do we choose among eligible stacks? Possibilities:
-
Pick the "oldest" stack; the one for which it's been the longest time since
a book from it was previously picked.
-
Pick the highest stack, the one with the most titles therein. (Because it
needs the most work, I guess.)
-
Just pick a stack at random.
-
Pick a random stack weighted by stack height. That is, any stack
can be
picked, but one with eight titles in it is twice as likely to be picked
as one with four titles. (This was the algorithm used in the previous
version.)
-
Pick a random stack, weighted by age. That is, a stack that's 90 days
old is twice as likely to be picked as a 45-day old one.
-
But what I'm doing is a combination of the last two: the stack-weighting
function is the stack height times the stack age. So (for example) a
120-day-old stack with 5 titles is twice as likely to be picked as a
50-day-old stack with 6 titles. Because 120 * 5 = 600 and 50 * 6 = 300. This is totally
arbitrary, but it seems to work for me so far.
Here's my current take
on scripting that.
Each stack is implemented as a comma-separated values (CSV) file,
headerless, one
line per book, each line containing two fields:
- The book title;
- Whether I own the book yet (1/0 = yes/no).
For example, here's the current content of
moore.csv
,
containing the to-be-read books of Christopher Moore:
"The Serpent of Venice",1
"Secondhand Souls",1
Noir,0
I.e., three books, the first two owned, the third one, Noir,
unpurchased as yet. (I'll get it someday, and edit the file to change
the 0 to 1.)
[Added: in addition to 0/1, 'K' indicates that the book's Kindle version
is owned. This is just a convenience in case I go looking for it long
after actually buying it.]
There is a "master" CSV file, stacks.csv
. It has a header
(for some reason that I forget). Each non-header line contains data for
a single stack:
- The (nice human-readable) stack name;
- The stack ID (corresponding to the name of the stack file);
- The minimum time, in days, that should elapse between consecutive
picks from that stack;
- The date when a book was most recently picked from the stack.
As I type, here's what it looks like:
name,id,minage,lastpicked
"Chronicles of Amber",amber,42,2018-04-15
"C.J. Box",box,30,2018-06-16
"Michael Connelly",connelly,30,2018-06-22
"Continental Op",continental_op,30,2018-06-09
"Conservative Lit 101",conservative_lit_101,60,2017-09-07
"Elmore Leonard",elmore,30,2018-06-28
"Dick Francis",francis,30,2018-04-20
"General Fiction",genfic,30,2018-06-13
"Steve Hamilton",hamilton,30,2018-04-29
"Robert A. Heinlein",heinlein,30,2018-06-19
Monkeewrench,monkeewrench,30,2018-05-28
"Christopher Moore",moore,30,2018-04-23
Mystery,mystery,30,2018-01-04
Non-Fiction,nonfic,30,2018-07-01
"Lee Child",reacher,30,2017-12-29
"Science Fiction",sci-fi,30,2018-05-30
Spenser,spenser,30,2017-05-01
"Don Winslow",winslow,30,2018-03-02
No comments from the peanut gallery about my lack of literary taste,
please.
Picking a random stack according to a weighting function isn't hard.
I'd pseudocode the algorithm like this:
Given:
N eligible stacks (indexed 0..
N-1), with
W
i being the calculated weight of the
ith list
(assumed integer) …
Let T be the total weight,
W0 + W1 + ⋯ + WN-1
Pick a random number r between 0 and T-1.
p = 0
while (r >= Wp)
r -= Wp
p++
… and on loop exit p will index the list picked.
To anticipate CS pedants: I know this is
O(N) and using a binary search instead could make it O(log N).
In practice, it's plenty fast enough. And other steps in the process
are O(N) anyway.
Enough foreplay!
The "picking" script, bookpicker
, is here.
Notes:
-
Specifying the
-v
"verbose" flag will output a list of each stack's
pick-probabilities.
-
The
Text::CSV
Perl module is used for reading/writing CSV
files. The Time::Piece
and Time::Seconds
modules are invaluable for doing the simple age calculations and
comparisons.
- You just run the script with no arguments or options; output is the
title and the name of the picked list.
- The user is responsible for maintaining the CSV files; no blank/duplicate
lines, etc. I use My Favorite Editor (vim), but CSVs are also editable
with Your Favorite Spreadsheet.
- For the "picked" stack, the script writes a smaller file with the
picked title missing. The old stack is saved with a
.old
appended to the name. The stacks.csv
file is also updated
appropriately with today's date for the last-picked field for the picked
stack.
-
The weighting function and random number generation are constrained to
integer values; I think it would work without that, but who wants to
worry
about rounding errors? Not I.
I also have a couple scripts to list out the contents of the to-be-read
database.
-
A script that produces plain text output (on stdout) is here.
-
A script that produces an HTML page and displays it in my browser
(Google Chrome) is
here.
It uses text color to signify eligible/ineligible stacks and
owned/unowned books.
Sample output (again, comments on my literary taste, or lack thereof, are
welcome) is
here.
The HTML::Template
module is used to make output generation
easier, and the template used for that is
here
Getting it to show up in my browser is accomplished via
chromix-too
server/client/extension; if you don't have it, it's pretty easy to do
something else instead.
Whew! I feel better getting this off my chest..