To Scale

Confession: I love The Big Bang Theory and thanks to TiVo, I'm working on watching all 279 episodes, in order.

But in just about every episode, there's this stuff that makes my physics major brain hurt a bit:

Reader, as Wolfgang Pauli allegedly said: "That is not only not right; it is not even wrong." Electrons are not shiny balls; they do not orbit atomic nuclei on shiny rails. (They don't make whooshing sounds, either, but that's even more of a quibble.)

Now: at a certain level, attempting to visualize what atoms "really look like" is futile. It's just math down there, solutions to the Schrödinger equation, or some other formulation.

But they could at least try to get the scale right.

Or, more accurately, once you try to get the scale right, you can see why they didn't.

Let's imagine—because we're not going to actually do it—building a scale model of a good old water molecule, H2O: an oxygen atom, with two hydrogen atoms hanging off to one side.

A hydrogen atom nucleus, a single proton, has a radius "root mean square charge radius") 8.4e-16 meters.

Let's say our scale model uses a ping-pong ball to represent a proton. A ping-pong ball's radius is 20 millimeters, or 2.0e-2 meters. (Fascinating fun fact from the link: "the size increased from 38mm to 40mm after the 2000 Olympic Games." I did not know that!)

So for our scale model, we have to multiply atomic/molecular distances by a scale factor of (2.0e-2/8.4e-16) = 2.38e13 (I.e., just under 24 trillion.)

The radius of an oxygen atom's nucleus is generally reported to be 2.8e-15 meters. Multiplying this by our scale factor, gives 2.8e-15 * 2.38e13 = 6.7e-2 meters, or 67 millimeters, about the size of a medium grapefruit.

So: to start building our scale model, gather together a grapefruit and two ping-pong balls. Where do we put them?

This page reports the distance between the oxygen nucleus and the hydrogen nuclei is 0.943 angstroms, or 9.43e-11 meters.

This scales up to 9.43e-11 * 2.38e13 = 2244 meters. Or about 1.4 miles. So:

  1. Put your grapefruit down;
  2. Walk 1.4 miles in a straight line;
  3. Drop one ping-pong ball;
  4. Return to the grapefruit;
  5. Turn approximately 106° from your original direction;
  6. march another 1.4 miles, and drop your other ping-pong ball.

That completes placement of the nuclei. Now we have to consider the electrons (ten of them) that swarm around the nuclei. Where do they go, and how do we represent them?

Reader, the best thing I can think up, visualization-wise, is a fuzzy cloud. That Schrödinger equation thing I referred to above would (if we solved it) give us a probability of finding an electron within a certain hunk of space. That probability is relatively high close to the nuclei, and gets much smaller as you get further away. And, as an added complication, the electrons have a higher probability to flock around the oxygen nucleus than the hydrogen nuclei. Visualize that however you'd like. The page referenced above does it with color.

The page referenced above talks about the "Van der Waals" diameter of the water molecule, which is as good a size estimate as we are likely to get; that's about 2.75e-10 meters. Scaling that distance up gives 2.8e-10 * 2.38e13 = 6664 meters, or about 4.14 miles.

So, to summarize: our scale model water molecule is a fuzzy cloud over 4 miles in diameter, in which is hiding a grapefruit and two ping-pong balls. And I admit, this would be difficult to picture on the TV screen in a way that might appeal to viewers. Still, it would be better than what they did.

Another fun fact: the electrons make up only 0.03% of the mass of the water molecule. For your typical Poland Spring 500 milliliter (16.9 oz) water bottle, that means most of the volume is those fuzzy electrons. Their total mass, however, is a mere 150 milligrams or so; the remaining 499.85 grams resides in those tiny nuclei.

Here's one thing that does not scale well. How fast do water molecules typically move? Much faster than the "atoms" you see on The Big Bang Theory. Googling will tell you that they exhibit a range of speeds (a Maxwell-Boltzmann distribution, approximately) and for water molecules at (roughly) room temperature, the average speed works out to 590 meters/sec (≈1300 mph).

So be glad that the water molecules in that Poland Spring bottle don't suddenly ("at random") decide to start moving in the same direction.

Yeah, that's impossible. Conservation of momentum saves us there.

But scaling that to our model, we get 1.4e16 meters/sec.

Which is (um) 47 million times the speed of light.

Very difficult to visualize!


Last Modified 2024-07-15 5:20 PM EST

My Book Picker (and Lister)

2022 Version

librarything

Introduction/Rationale

This blogpost describes the most recent version of my "book picking" system. It assumes a Linux operating system, and uses Perl. Some "extra" Perl modules are used, not in the core Perl distribution: Const::Fast, version, and HTML::Template. (My current distribution, Fedora, makes installing these modules pretty easy. Your mileage… etc.) Files are available at GitHub.

I've used this system for a number of years, and have tweaked it significantly over that period.

But one thing hasn't changed at all: it's another example of the mental aberration that causes me to write Perl scripts to solve life's little everyday irritants. In this case two little irritants:

  1. I noticed that I had a lot of books on my shelves, acquired long past, that I never got around to reading. Either because (a) they were dauntingly long and dense (I'm thinking about Infinite Jest by David Foster Wallace); or because (b) they just fell through the cracks. Both poor excuses, but there you are.

  2. I sometimes want to methodically read (or reread) a series of books in a particular order.

In other words, I needed a way to bring diligence and organization to my previous chaotic and sloppy reading habits.

I think of what I came up with as the "To-Be-Read" (hereafter TBR) database. That's a slightly lofty title, but anyway:

High-Level View

All the TBR books are in zero or more stacks, each stack containing zero or more titles. Each stack contains a list of books in maintained in the order I want to read them. (This goes back to the issue mentioned above: sometimes a series really "should" be read in publishing order, for example C.J. Box's novels featuring protagonist Joe Pickett.)

So picking a book to read involves (a) choosing an "eligible" stack; and (b) "popping" the top book from the chosen stack. Very computer science-y.

The interesting part is the "choosing an eligible stack" step. There are a number of possible ways to do it. But first, more details on "eligibility".

  • "Obviously" you can't pick a book off an empty stack. So a stack with no books in it is ineligible. (Why are there empty stacks? Because I might want to add one or more books to them someday. Like if Steve Hamilton ever writes another book.)
  • The stacks also contain books I don't have yet. I want to read them someday. But I'm waiting. Maybe a book has been announced but not released yet. (Example below.) Or I'm waiting for the price to come down, either via the Barnes & Noble remainder table or the Amazon used market. I'm RetiredOnAFixedIncome, after all. So: If the top book on a stack is unowned, there's no point in picking it. Hence, that stack is ineligible.
  • One final tweak: I found that I didn't want to read a book "too soon" after just reading a previous book in the stack. So each stack has an "age": the time that's elapsed since I previously picked a book from that stack. And a "minimum age", the amount of time that must elapse after a pick before that stack becomes eligible again.

Executive summary: an eligible stack is one that:

  • is non-empty;
  • the top book is owned;
  • the stack is older than its specified minimum age.
OK, so how do we choose among eligible stacks? Possibilities:
  1. Pick the "oldest" stack; the one for which it's been the longest time since a book from it was previously picked.
  2. Pick the highest stack, the one with the most titles therein. (Because it needs the most work, I guess.)
  3. Just pick a stack at random.
  4. Pick a random stack weighted by stack height. That is, any stack can be picked, but one with eight titles in it is twice as likely to be picked as one with four titles. (This was the algorithm used in the previous version.)
  5. Pick a random stack, weighted by age. That is, a stack that's 90 days old is twice as likely to be picked as a 45-day old one.
  6. But what I'm doing is a combination of the last two: the stack-weighting function is the stack height times the stack age. So (for example) a 120-day-old stack with 5 titles is twice as likely to be picked as a 50-day-old stack with 6 titles. Because 120 * 5 = 600 and 50 * 6 = 300. This is totally arbitrary, but it seems to work for me so far.

Now, on to the gory details.

The data file ~/var/bookstacks.pl

Previous versions of the system used CSV files to store all this data. I've switched over to a single file (~/var/bookstacks.pl), containing executable Perl code that is used to initialize an array of hashes named @STACKS. At a high level, it looks like:

@STACKS = (
     {
         hash elements for stack 0
     },
     {
         hash elements for stack 1
     },
     …
     {
         hash elements for stack N-1
     },
);

(It Is No Coincidence that this resembles output from the standard Perl Data::Dumper module. See below.)

Each @STACKS array element is a hash. Here's the (actual, as I type) entry for my Michael Connelly stack:

    …
    {
      'name' => 'Michael Connelly',
      'minage' => 30,
      'lastpicked' => '2021-08-21',
      'books' => [
		   {
		     'title' => 'The Dark Hours',
		     'author' => 'Michael Connelly',
		     'ASIN' => 'B08WLRG1L2',
		     'owned' => 1
		   },
		   {
		     'title' => 'Desert Star',
		     'author' => 'Michael Connelly',
		     'ASIN' => 'B09QKSLPN9',
		     'owned' => 0
		   }
		 ]
    },
    …

In words: this @STACKS element contains the stack's name ("Michael Connelly"); the stack's minimum age before becoming eligible (30 days); the date the stack was previously picked (August 21, 2021); and the books currently in the stack. (There are two, The Dark Hours, which I own on Kindle, and Desert Star, not out until November 2022, hence unowned.)

(Yes, that's a subarray of hashes inside the outer array of hashes. Why are you looking at me like that?)

(And no, I haven't memorized the rules about when/whether to use […], {…}, or (…). After decades of Perl coding, I still crack open the Perl Data Structures Cookbook or peruse my existing code where I see if I've done something similar that worked in the past.)

A complete file (my actual version as of April 2022) is here. No comments from the peanut gallery about my lack of literary taste, please.

I named it with a .pl extension because some editors will use that as a hint to do Perl syntax highlighting. It can be read into a script with Perl's do command. For example…

booklister

The booklister script (here) is the simplest script in the system. It reads the data file described above and displays its contents in a (slightly) more verbose and readable form. It also prints, for each eligible stack, its weight and pick-probability.

Sample booklister output is here.

booklister_html

The booklister_html script (here) is slightly more complicated. It uses an HTML::Template template to generate an HTML page of the book stacks. It uses text formatting to display stack eligibility/ineligibility, and whether a book is owned or not. Finally, it generates a nice pie chart to display each eligible stack's pick-probabilities, using Google Charts code. It saves the generated page in /tmp/booklist.html; example here.

bookpicker

The bookpicker script (here) simply ("simply") sucks in the bookstacks data, filters out the eligible stacks, then picks one of the eligible stacks at (weighted) random. It "pops" the book at the top of the stack (actually uses a Perl shift, because…). And finally, it writes the modified stack data back out to ~/var/bookstacks.pl, saving the previous version with a .old appended to its name.

(Perl's Data::Dumper module is used for that last part. Some tweaks are used to get it usable for initialization and to get the hash keys to print in a non-random order.)

If you'd like a little more detail on the picking process, you can add the -v (verbose) flag. Speaking of that, a small digression on…

Picking a random stack according to a weighting function

It's not hard. Just imagine throwing a dart at that pie chart mentioned above. Your probability of hitting any one segment is proportional to its area. So…

I'd pseudocode the algorithm like this:

Given: N eligible stacks (indexed 0..N-1), with Wi being the calculated weight of the ith list (assumed integer) …

Let T be the total weight, W0 + W1 + ⋯ + WN-1

Pick a random number r between 0 and T-1.

p = 0
while (r >= Wp)
     r -= Wp
     p++

… and on loop exit p will index the list picked.

To anticipate CS pedants: I know this is O(N) and using a binary search instead could make it O(log N). In practice, it's plenty fast enough. And other steps in the process are O(N) anyway.

Editing the stacks, an apology

But what if you want to add a new stack? Or add books to a stack? Or delete something? Or (just generally) change something?

I don't have any handy scripts for that. It's a hard problem. I, personally, just edit ~/var/bookstacks using My Favorite Editor (that would be vim).

I have some ideas for a user-friendlier script. Maybe someday. Now that I've said it, maybe someday soon.

Whew!

I feel better getting this off my chest..


Last Modified 2022-08-20 6:23 AM EST

Listmaker

For Efficient Traversal of the Grocery Store

[2021-07-13 Update: Didn't take long after my initial post to make some major changes. Embarassing that I didn't get it right the first time.]

Another Perl/Linux-based salve to the mental aberration I've mentioned in the past: using my meager coding skills as a hammer to whack down life's occasional slightly-annoying nails. Specifically, grocery shopping. It's partially my job. Mrs. Salad provides me with a handwritten list. A recent example:

Example List

Nice handwriting, right? Yes, she really calls those cookie dough packages "plunk & bake". And this example is neater than average, shorter than average, and pretty well organized. Still… Often I'll be finishing up shopping in Aisle 13 of the Dover, New Hampshire Hannaford … and suddenly realize that I missed getting something back in Aisle 2.

(Or sometimes not realizing I missed items until I get home.)

What I wanted was a list organized in the order in which I actually go through the store, separated into aisles (or departments) to make it easy to check that I've gotten (for example) all the Aisle 2 items before I move on to Aisles 3, 4, …

Something like this, an HTML table:

Loc/Aisle Qty Item Notes
4   Brownie Mix
    Jambalaya Mix Large
5   Raisin Bran Crunch
    Pineapple Tidbits
    Rice Krispies
6 2 V8
8   Incredibites Dry, Chicken
Back Wall   Milk
    Oatly
11   Bread Artesano
13   Cookie Dough
    Yogurt
    Pie Crust
    Ice Cream Cherry Vanilla
    Outshine Coffee Bars

In fact, exactly like that. You might notice I've added a couple items of my own; I'm in charge of keeping track of pet food, Raisin Bran Crunch, V-8, and a few other things.

Hence this script, listmaker; it produces a suitable-for-printing HTML list organizing the listed items into the order I traverse the store. Typically that's in ascending-aisle order, but including the departments (Deli, Meat, Seafood, Bakery,…) on the store's periphery. Before I leave (say) Aisle 2, it's easy to verify that I've picked up everything I was supposed to get in Aisle 2.

The workflow is simple. First, I transcribe the handwritten list into a text file:

2 V8
Raisin Bran Crunch
# Orange Juice
# Eggs
# Coffee
Cookie Dough
Milk
Yogurt
Oatly
Pie Crust
Brownie Mix
Pineapple Tidbits
Jambalaya Mix | Large
Rice Krispies
Bread|Artesano
Ice Cream|Cherry Vanilla
Outshine Coffee Bars
Incredibites|Dry, Chicken

The syntax is simple, informal, and flexible:

  • One "item" per line.
  • Lines starting with a pound sign (#) are comments, and are ignored. Used for commonly-bought items; just remove the pound sign to include them, add one to exclude them.
  • A leading digit string designating quantity is optional. Of course, a missing number implies quantity 1.
  • An optional "Notes" field is text following a vertical bar (|). This can be used in many ways: specifying a brand, size, flavor,… Notes go in a separate column in the HTML table.

Once the list is transcribed, the script can be run. Example, assuming the transcribed list above is in the file $HOME/Documents/mylist:

$ listmaker ~/Documents/mylist
[HTML list saved at file:///home/pas/Documents/mylist.html]

As a somewhat arbitrary design choice, the HTML output file is written to the same directory containing the list, with the .html extension tacked on.

I use a "store configuration file" for store-specific details. It contains Perl initialization code for two hashes:

  • %ORDER which specifies the order in which I visit aisles/departments:

    %ORDER = (
        '10'        => 14,
        '11'        => 16,
        '12'        => 17,
        '13'        => 18,
        '1'         => 4,
        '2'         => 5,
        '3'         => 6,
        '4'         => 7,
        '5'         => 8,
        '6'         => 9,
        '7'         => 10,
        '8'         => 12,
        '9'         => 13,
        'Back Wall' => 15,
        'Bakery'    => 1,
        'Deli'      => 2,
        'Front End' => 21,
        'Hbc 4l'    => 20,
        'Meat'      => 11,
        'Pharm'     => 19,
        'Produce'   => 0,
        'Seafood'   => 3,
    );
    

    In this case: the visitation order is: Produce, Bakery, Deli, Seafood, Aisle 1, 2, … (I've used perltidy to prettify the actual file.)

  • %HMAP which maps item names to aisles/locations. It contains many lines, here's a sample:

    
    %HMAP = (
        'v8'                 => '6',
        'raisin bran crunch' => '5',
        'brownie mix'        => '4',
        'pastrami'           => 'Deli',
        'ice cream'          => '13',
        […]
    );
    

    This hash can get messy and possibly redundant, especially if you (like me) are not consistent or careful in how you specify items. It's easy enough (if somewhat tedious) to clean up with a text editor.

The configuration file is loaded into the script with a Perl do command. Some basic sanity checks are performed.

The default location for the configuration file is $HOME/etc/listmaker.cf. The idea here is that if you want to use this script for more than one store, you use different configuration files. A non-default configuration file is specified to the script with the -s option, for example:

$ listmaker -s ~/Walmart.cf mylist

This is a project in my Github repository; the script is here. Notes:

  • The script uses the HTML::Template CPAN module to produce its HTML output. The template is pretty straightforward and it is here.

  • If there's an item on the script not found in the configuration file, the script will ask that you provide an aisle/location for it. Good news: your response will be used to update the configuration file, so you won't need to do that in the future. (Specifically, the script uses the Data::Dumper Perl module to produce new initialization code for the hashes described above, written back out to the file.)


Last Modified 2021-07-13 6:38 AM EST

Easy-Peasy Link Generator

[Update 2020-07-16: added some new logic to allow link target text to be provided on standard input. Prettified (slightly) the site-string chopping regex.]

First, a bit of background on my environment:

  • I use Google Chrome for my browser in Linux.

  • I use plain old vim in a terminal window to compose HTML for this blog.

  • And what I want to do all the time when composing HTML is to generate a link to the page displayed in the active tab in Chrome's current window.

For example, if I'm looking at this page in Chrome, I might say "Ooh, cool!" and want to insert the following into my HTML:

<a href="https://science.slashdot.org/story/20/07/01/1816253/a-massive-star-has-seemingly-vanished-from-space-with-no-explanation">A Massive Star Has Seemingly Vanished from Space With No Explanation</a>

That's not hard to do by hand: copy the link from Chrome into the terminal window, add in the surrounding code for the a tag, add the target text, don't forget the end tag (</a>), and we're done!

Yeah, it's not hard, but it can be tedious.

I'm sure people—much smarter people—have come up with good solutions for this. But I'm a DIY kind of guy. So eventually (it only took years), I wrote this small (43 68-line) Perl script to do that for me. For historical reasons (by which I mean: arbitrary and silly reasons), I named it ttb. Which stands for "Thing To Blog", and it's installed in my $HOME/bin directory.

My usual use is in vim command mode, bang-bang-ttb:

!!ttb

… which will replace the current line with the HTML link:

<a href="URL">target</a>

where URL is (duh) the URL of the active tab of Chrome's current window.

The target link text is determined by the following logic:

  • If the current line contains any (non-whitespace) text, use that for the target text. (After trimming any leading or trailing whitespace.)
  • Otherwise, if any command-line arguments are specified, join them together with spaces, using the result as the target text.
  • Otherwise, use the HTML title of the displayed page as the target text.

That might look a bit convoluted, but… well, it is. But it works OK for me.

Notes:

  • The script assumes you have installed the chromix-too Chrome extension package. Which is easy enough to get. In Fedora, I install the npm package first:

    # dnf install npm

    or equivalent sudo if you prefer that. Then:

    # npm install -g chromix-too

    This package contains a client-server pair: chromix-too and chromix-too-server. The server can be run after Chrome itself starts up. (I run both Chrome and chromix-too-server as startup commands.)

  • The script executes the client via the shell command:

    $ chromix-too raw chrome.tabs.query '{ "active":true, "currentWindow":true }'

    which produces JSON output about the active tab in the current window. The JSON perl module (I think it's installed by default in Fedora) is required to decode that into a Perl structure. The decode function returns an "array of hash", but I think the array should always have just one element, so we just pop that.

  • Ugly things probably happen if you run this without the browser or the chromix server running. I should probably provide a clean exit in that case.

  • I noticed a lot of sites (mostly blogs) have HTML page titles that append a uniform site string. There's an ugly ad hoc regex in the code to chop those off. (Or should that be ad hack?)

That's a dreadful lot of verbiage about such a short script. As usual, this is not earth-shattering code, but I hope someone finds it useful, if only for tutorial purposes.

And if you know of a better way to do this… don't tell me, OK?

The source may be found at GitHub.


Last Modified 2020-07-16 6:02 AM EST

Toss Your Cookies

[Amazon Link]
(paid link)

Some sites (like the Boston Globe) are pretty nasty about letting you have access to a severely limited number of "free" pages. They do this by leaving web cookies on your computer so they can recognize your browser when it returns for more content.

You can try opening such sites in Incognito mode (or whatever the equivalent is in non-Chrome browsers), but they can detect that and give you a nastypage instead of the desired content.

For the same reason, extensions that allow you to reject cookies from selected sites also produce chiding messages: you must accept cookies to see our stuff!

You can probably search out and destroy these sites' cookies once they're on your computer by following the instructions for your browser. Here's what you do in Chrome, for example. Works, but there's a lot of tedious pointy-clicky. (Note added 2019-12-27: you can make things less tedious by setting up a bookmark to chrome://settings/siteData.)

My current past workaround (see below) is (so far) working pretty well for me, a one-click solution: a Chrome extension called RemoveCookiesForSite. It simply displays a broken cookie, probably to the right of the title bar. Once you're viewing a site that insists on dropping cookies on you, just click on that. Voila, cookies gone without any fuss, and the site is none the wiser.

This may screw up the revenue models of some sites. Sorry! I'mRetiredOnAFixedIncome!

Update 2019-10-17: I should have added this update long before now. Chrome has a simpler (zero-click) solution. I won't go through all the pointy-clicky, but navigate through Settings → Advanced → Privacy and Security → Site Settings → Cookies and Site Data. (Or just navigate to chrome://settings/content/cookies. [added 2019-12-27]) One of the options is to 'Clear on Exit'. Add the problematic domain using a wildcard, e.g. '[*.]nytimes.com'.

I'm not sure how long this option has been in Chrome, but it makes me forgive Google for a lot of sins.

Caveat: Sometimes this fails if you have a long-running browser session and you have the bad luck to visit a cookied site enough times to hit their limit during the session. Sigh. In this case, I do the old 'Delete cookies for site' method linked above in a new tab. Then refresh the browser tab for the site.


Last Modified 2024-01-24 6:52 AM EST

My Book Picker (and Lister)

2018 Version

librarything
[Update 2019-11-11: Sources moved to GitHub; verbose flag added to picking script; HTML listing script includes stack weights and probabilities, and indicates whether book is owned on Kindle.]

This is an updated version of a "geekery" post from last year. I've made substantial changes to the script it describes since then. I'm leaving the former (and much simpler) version in place, but also wanted to show off my new version.

But one thing hasn't changed at all: it's another example of the mental aberration that causes me to write Perl scripts to solve life's little everyday irritants. In this case two little irritants:

  1. I noticed that I had a lot of books on my shelves, acquired long past, that I never got around to reading. Either because (a) they were dauntingly long and dense (I'm thinking about Infinite Jest by David Foster Wallace); or because (b) they just fell through the cracks. Both poor excuses, but there you are.

  2. I sometimes want to methodically read a series of books in a particular order.

In other words, I needed a way to bring diligence and organization to my previous chaotic and sloppy reading habits.

I think of what I came up with as the "To-Be-Read" (hereafter TBR) database. That's a slightly lofty title, but anyway:

The high-level view: all the TBR books are in zero or more stacks, each stack containing zero or more titles. Each stack is maintained in the order I want to read the books therein. (This goes back to the issue mentioned above: sometimes a series really "should" be read in publishing order, for example C.J. Box's novels featuring protagonist Joe Pickett.)

So picking a book to read involves (a) choosing an "eligible" stack; and (b) "popping" the top book from the chosen stack. Very computer science-y.

The interesting part is the "choosing an eligible stack" step. There are a number of possible ways to do it. But first, more details on "eligibility".

The major problem with the previous version of this script was that too often it would pick a book "too soon" after I'd read something off the same stack. (An issue mentioned in last year's post.) As it turns out, I wanted to let some time go by between picks from the same stack. (For example, at least 30 days between books by Heinlein. Too much of a good thing, too soon…)

So: in this version, each stack has an "age": the time that's elapsed since I previously picked a book from that stack. And a "minimum age", the amount of time that must elapse after a pick before that stack becomes eligible again.

Another minor difference: I don't actually own some of the books in some of the stacks yet. I want to read them someday. But I'm waiting, typically for the price to come down, either via the Barnes & Noble remainder table or the Amazon used market. I'm RetiredOnAFixedIncome, after all.

So an eligible stack is one that:

  • is non-empty;
  • the top book is owned;
  • the stack is older than its specified minimum age.
OK, so how do we choose among eligible stacks? Possibilities:
  1. Pick the "oldest" stack; the one for which it's been the longest time since a book from it was previously picked.
  2. Pick the highest stack, the one with the most titles therein. (Because it needs the most work, I guess.)
  3. Just pick a stack at random.
  4. Pick a random stack weighted by stack height. That is, any stack can be picked, but one with eight titles in it is twice as likely to be picked as one with four titles. (This was the algorithm used in the previous version.)
  5. Pick a random stack, weighted by age. That is, a stack that's 90 days old is twice as likely to be picked as a 45-day old one.
  6. But what I'm doing is a combination of the last two: the stack-weighting function is the stack height times the stack age. So (for example) a 120-day-old stack with 5 titles is twice as likely to be picked as a 50-day-old stack with 6 titles. Because 120 * 5 = 600 and 50 * 6 = 300. This is totally arbitrary, but it seems to work for me so far.

Here's my current take on scripting that.

Each stack is implemented as a comma-separated values (CSV) file, headerless, one line per book, each line containing two fields:

  1. The book title;
  2. Whether I own the book yet (1/0 = yes/no).
For example, here's the current content of moore.csv, containing the to-be-read books of Christopher Moore:

"The Serpent of Venice",1
"Secondhand Souls",1
Noir,0

I.e., three books, the first two owned, the third one, Noir, unpurchased as yet. (I'll get it someday, and edit the file to change the 0 to 1.)

[Added: in addition to 0/1, 'K' indicates that the book's Kindle version is owned. This is just a convenience in case I go looking for it long after actually buying it.]

There is a "master" CSV file, stacks.csv. It has a header (for some reason that I forget). Each non-header line contains data for a single stack:

  1. The (nice human-readable) stack name;
  2. The stack ID (corresponding to the name of the stack file);
  3. The minimum time, in days, that should elapse between consecutive picks from that stack;
  4. The date when a book was most recently picked from the stack.
As I type, here's what it looks like:

name,id,minage,lastpicked
"Chronicles of Amber",amber,42,2018-04-15
"C.J. Box",box,30,2018-06-16
"Michael Connelly",connelly,30,2018-06-22
"Continental Op",continental_op,30,2018-06-09
"Conservative Lit 101",conservative_lit_101,60,2017-09-07
"Elmore Leonard",elmore,30,2018-06-28
"Dick Francis",francis,30,2018-04-20
"General Fiction",genfic,30,2018-06-13
"Steve Hamilton",hamilton,30,2018-04-29
"Robert A. Heinlein",heinlein,30,2018-06-19
Monkeewrench,monkeewrench,30,2018-05-28
"Christopher Moore",moore,30,2018-04-23
Mystery,mystery,30,2018-01-04
Non-Fiction,nonfic,30,2018-07-01
"Lee Child",reacher,30,2017-12-29
"Science Fiction",sci-fi,30,2018-05-30
Spenser,spenser,30,2017-05-01
"Don Winslow",winslow,30,2018-03-02

No comments from the peanut gallery about my lack of literary taste, please.

Picking a random stack according to a weighting function isn't hard. I'd pseudocode the algorithm like this:

Given: N eligible stacks (indexed 0..N-1), with Wi being the calculated weight of the ith list (assumed integer) …

Let T be the total weight, W0 + W1 + ⋯ + WN-1

Pick a random number r between 0 and T-1.

p = 0
while (r >= Wp)
     r -= Wp
     p++

… and on loop exit p will index the list picked.

To anticipate CS pedants: I know this is O(N) and using a binary search instead could make it O(log N). In practice, it's plenty fast enough. And other steps in the process are O(N) anyway.

Enough foreplay! The "picking" script, bookpicker, is here. Notes:

  • Specifying the -v "verbose" flag will output a list of each stack's pick-probabilities.

  • The Text::CSV Perl module is used for reading/writing CSV files. The Time::Piece and Time::Seconds modules are invaluable for doing the simple age calculations and comparisons.

  • You just run the script with no arguments or options; output is the title and the name of the picked list.

  • The user is responsible for maintaining the CSV files; no blank/duplicate lines, etc. I use My Favorite Editor (vim), but CSVs are also editable with Your Favorite Spreadsheet.

  • For the "picked" stack, the script writes a smaller file with the picked title missing. The old stack is saved with a .old appended to the name. The stacks.csv file is also updated appropriately with today's date for the last-picked field for the picked stack.

  • The weighting function and random number generation are constrained to integer values; I think it would work without that, but who wants to worry about rounding errors? Not I.

I also have a couple scripts to list out the contents of the to-be-read database.

  1. A script that produces plain text output (on stdout) is here.

  2. A script that produces an HTML page and displays it in my browser (Google Chrome) is here. It uses text color to signify eligible/ineligible stacks and owned/unowned books. Sample output (again, comments on my literary taste, or lack thereof, are welcome) is here.

    The HTML::Template module is used to make output generation easier, and the template used for that is here

    Getting it to show up in my browser is accomplished via chromix-too server/client/extension; if you don't have it, it's pretty easy to do something else instead.

Whew! I feel better getting this off my chest..


Last Modified 2024-06-03 9:36 AM EST

Replacing TPGoogleReader

Futurama quote pattern

Note: No actual code here.

Back in July 2013, Google discontinued its "Reader" RSS/Atom feed aggregation service. Basically: you subscribed to a number of websites via their syndication feeds. Google would periodically query the feeds for new content. It would also keep track of what articles you had "read". (More accurately: marked as read. You didn't actually have to read them.) There are a number of services that do that sort of thing. I used Reader because of the independently-developed TPGoogleReader Chrome extension. Specifically, for one lousy feature of TPGoogleReader. You could get it to:

  1. Query Google Reader for your unread articles;

  2. Automatically open up a number of browser tabs showing unread articles, up to a specified maximum;

  3. And this is the critical part: when I closed an auto-opened tab, TPGoogleReader would open up the next unread article in a new tab in the background.

This made browsing a large number of sites an efficient breeze. When I finished reading one article, a tab-closing control-W all by itself would bring up a new background tab with the next unread article in my feed. No mouse-messing. Concentrate on reading content. Bliss.

It took a few years, and numerous false starts, but I'm back at that point again. Here's how:

  • I moved to a free Inoreader account to take over the RSS feed monitoring. They are reliable, active, and seem to be hanging around.

  • I wrote a "fetch" Perl script that uses the WebService::Google::Reader to log into Inoreader and download unread article data. As you might guess from the name, the module author originally developed for Google Reader, but graciously made the necessary changes to make it work with Inoreader.

    I run this script periodically via anacron.

  • The final bit of the puzzle was the Chromix-Too extension for the Google Chrome web browser. This consists of a JavaScript client/server pair that communicate over a Unix-domain socket. The client bit has a simple command interface, and I only use two of them:

    1. Tell me how many tabs the browser has open:

      chromix-too raw chrome.tabs.query '{}'

      The output is a mass of detailed JSON, but that's pretty easy to parse.

    2. Open a new tab in the background with a specified URL:

      chromix-too raw chrome.tabs.create '{"active":false,"url":"URL"}'

I'm leaving out a lot of details, but they are pretty straightforward (and of very little general interest): storing a local list of unread articles, figuring out whether it's appropriate to open one (and if so which one), time delays, etc. I wrap all this logic in a "reader" Perl script which I run whenever I have the browser running.

But I'm back to web-surfing Nirvana again, so that's good. The only downside (sort of) is that all this happens on a single (Linux) host. That's OK for me.


Last Modified 2018-12-28 4:45 AM EST

An HTML Calendar Generator

[Amazon Link]
(paid link)

[November 2019: sources moved to GitHub]

Awhile back I replaced the (increasingly unwieldy) monthly archive section over there in the right-hand column with a yearly archive section: one link per year that Pun Salad has been in existence. Each link takes you to a yearly calendar, which, in turn, contains links to the monthly archives (when you click on a month name) or daily posts (when you click on a day). Example output here for 2017.

The code to generate those calendars is embedded in the (very) special purpose CGI script that powers Pun Salad, but I thought the calendar generation code might be of interest to people.

Notes:

  • The script is run with a single year argument, and produces HTML on standard output.

  • The Perl module Time::Piece does most all of the heavy lifting for the necessary date calculations. It probably breaks down for years far in the past or future; I haven't messed with that too much. I tested that it gives the same calendar for 1901 as the Linux cal command does, so that's good.

  • The HTML::Template module is used to specify the HTML framework for the calendar. Obviously, that's where you might want to customize the appearance. The code assumes the template resides in your top-level ~/Templates directory.

  • The calendar is a table of months; each month is a table of days. This means, of course, that the generator is essentially a four-deep nested loop. Eek! A voice from my old structured programming days said: "you really shouldn't nest loops that deeply". So I broke out the month-generation into a Perl subroutine, and now I feel better about myself.

As usual, this is not earth-shattering code, but I hope someone finds it useful, if only for tutorial purposes.


Last Modified 2024-06-03 9:59 AM EST

Bing Desktop Backround Picture Downloading

For Fun and (No) Profit

[Update 2019-11-08: sources moved to GitHub.]

[Update 2019-03-27: The Bing People (the Crosbys?) changed the format of their JSON. That's their perfect right, but it required some slight changes to the pic-getting script.]

For a few years now, I've made the Important Life Choice about my computer's desktop backgrounds (aka "wallpaper"): downloaded photos of spectacular vistas, amazing animals, breathtaking architecture, … I'm not particular. Rotate them every so often to avoid boredom. This is often called a "slideshow".

This, even though my open windows usually obscure the background. I know it's there though, and it makes me happy. (And the Start-D key combo works to minimize all windows if I really want to peruse it.)

The OS environments I use (Windows 10, Fedora Linux/Cinnamon) make it easy to configure a slideshow: just find the configuration page, point it to a directory containing the pictures you like, choose a switching interval, and that's it. (If your environment doesn't let you do this easily, maybe you should find a better environment.)

That leaves only one issue: setting up the picture directory. My personal choice is to have my Windows "Pictures" directory shared via VirtualBox's shared folders feature to the Linux guest. (Detail: to allow me to write to this directory from Linux, my account must be added to the vboxsf group. It's on my "things to do" list when creating a new Linux guest.) I keep 400 pictures in this directory; when more new pictures are added, the same number—the oldest ones—are removed.

I used to download daily pictures from the National Geographic site, but they made that difficult awhile back; I don't remember the details, and I haven't checked recently to see if they relented. Instead I grab Bing's home page picture; there's a new one every day, and downloading, while not exactly a breeze, is not too difficult.

The Perl script I use to download is get_bingpics. Notes:

  • There's a magic URL at Bing that can be queried (with proper parameters) to divulge the recent Bing pictures and their names. Specifically, the page will contain (at most) the eight most recent. The query I use asks for 16.

  • For some reason, I request the JSON version of the picture data. This is decoded (naturally enough) into a Perl data structure with the decode_json function from the JSON::PP module.

  • For the available images, the script checks each to see if it has already been downloaded. For each image not previously downloaded, it uses the LWP::Simple function getstor to download to the shared directory.

    Although I typically run this script daily, this design allows me to skip up to eight days without missing any pictures. (For example, if I'm on vacation.)

  • I run this script out of anacron daily, details left as an exercise for the reader.

The other part of this equation is getting rid of older pictures. That's accomplished by the remove_old_pics script. Notes:

  • It's pretty simple.

  • Its claim to geekery is using the Schwartzian Transform to obtain a list of JPEG files in the picture directory in order by modification time. Sweet!

  • The code can be easily tweaked to change directories, the types of files examined, and how many "new" ones to keep.

  • This too is run daily via anacron.

OK, so how many of you out there are shaking your heads at this and saying: "Doesn't this boy realize he needs professional help?" Let's see a show of hands…


Last Modified 2019-12-03 11:03 AM EST

My Book Picker (and Unpicker)

[2019/11/11 Update: sources moved to https://github.com/punsalad/projects/tree/master/bookpicker_simple GitHub]

[2018/07/03 Update: A newer version is described here. I'm leaving this description, and the scripts it describes, in place, though, because it's simpler.]

Another example of the mental aberration that causes me to write Perl scripts to solve life's little everyday irritants. In this case two little irritants:

  1. I noticed that I had a lot of books on my shelves, acquired long past, that I never got around to reading. Either because (a) they were dauntingly long and dense (I'm thinking about Infinite Jest by David Foster Wallace); or because (b) they just fell through the cracks. Both poor excuses, but there you are.

  2. I sometimes want to methodically read a series of books in a particular order.

In other words, I needed a way to bring diligence and organization to my previous chaotic and sloppy reading habits.

Here's how I went about scripting that:

I conceptualized my "to be read" books as a collection of book stacks, like the picture at (your) right (except more of them). Each stack is a list of books:

  1. either organized around a specific theme (usually an author) or is a catchall (e.g. "non-fiction"); and

  2. maintained in the order I want to read them. (This goes back to the issue mentioned above: sometimes a series really "should" be read in publishing order, for example C.J. Box's novels featuring protagonist Joe Pickett.)

The implementation of this concept: each stack is a .list file in my Linux directory ~/var/reading_lists. As I type, sixteen of them:


(pas@oakland) ~/var/reading_lists: ls -l *.list
-rw------- 1 pas pas 183 Oct 20 17:47 amber.list
-rw------- 1 pas pas  41 May 17 18:05 asimov.list
-rw------- 1 pas pas 242 Jul 25 06:09 box.list
-rw------- 1 pas pas  93 Oct  9 12:27 connelly.list
-rw------- 1 pas pas  43 Sep  7 10:28 conservative_lit_101.list
-rw------- 1 pas pas  75 Sep 17 13:32 docford.list
-rw------- 1 pas pas  46 Jun 30 11:12 elmore.list
-rw------- 1 pas pas  83 Mar 29  2016 francis.list
-rw------- 1 pas pas 266 Oct 28 06:52 genfic.list
-rw------- 1 pas pas  65 Apr 13  2017 monkeewrench.list
-rw------- 1 pas pas 144 Oct 16 17:11 moore.list
-rw------- 1 pas pas 199 Oct 25 13:47 mystery.list
-rw------- 1 pas pas 523 Oct 16 13:12 nonfic.list
-rw------- 1 pas pas  56 Jul 18 15:04 reacher.list
-rw------- 1 pas pas 333 Aug 30 15:37 sci-fi.list
-rw------- 1 pas pas  45 Jun 11 15:50 winslow.list

Each list has one or more lines:


(pas@oakland) ~/var/reading_lists: wc -l *.list
   6 amber.list
   1 asimov.list
  11 box.list
   3 connelly.list
   1 conservative_lit_101.list
   5 docford.list
   4 elmore.list
   2 francis.list
   8 genfic.list
   4 monkeewrench.list
   5 moore.list
   6 mystery.list
  13 nonfic.list
   2 reacher.list
   9 sci-fi.list
   2 winslow.list
  82 total

… and each line in each file contains a different book title. Example with elmore.list, a list I created in lieu of watching the six seasons of Justified on Amazon Prime for the fourth time.


(pas@oakland) ~/var/reading_lists: cat elmore.list
Pronto
Riding the Rap
Fire in the Hole
Raylan

I.e., four books written by the late Elmore Leonard where Raylan Givens appears as a character.

The picking algorithm is simple and "works for me". When it's time to choose the next book to be read from this agglomeration, I pick a pile "at random" and take the book from the "top of the pile" (i.e., the one named in the first line of the file).

There is one more little tweak: the "random" pick is weighted by the length of the list. So (for example) since there are 82 books total in all lists above, and the nonfic.list has 13 lines, a book from that list would be picked with probability 1382. (Note the probabilities calculated this way add up to 1, the probability that some book from pile will be picked.)

That's not as hard as it might sound. I'd pseudocode the algorithm like this:

Given: N lists (indexed 0..N-1) with Bi books in the ith list…

Let T be the total number of books in the lists, B0 + B1 + … + BN-1

Pick a random number r between 0 and T-1.

i = 0
while (r >= Bi)
     r -= Bi
     i++

… and on loop exit i will index the list picked.

So: the "picking" script, bookpicker, is here. Notes:

  • You just run the script with no arguments or options.

  • I left "debugging" print statements in.

  • You're responsible for maintaining the lists; no blank/duplicate lines, etc.

  • For the "picked" list, the script writes a smaller file with the picked title missing. The old list is saved with a .old appended to the name. That's important, because next…

One last little gotcha: the randomization is sometimes a little too random. Specifically, sometimes after reading a book by a certain author, the picking script picks… the next book in the list by the same author. I don't want that. Variety is better.

So  there's also a script to "undo" a previous pick, bookpicker_unpick. If you run it before any other changes are made to the list files, it will find the most-recently-modified .list file, and "restore" the corresponding .list.old file. The script, is here.


Last Modified 2024-06-03 6:04 PM EST