{
    "version": "https://jsonfeed.org/version/1",
    "title": "Tom Shafer",
    "home_page_url": "https://tshafer.com/blog/",
    "feed_url": "https://tshafer.com/blog/feed.json",
    "description": "Tom Shafer's personal blog.",
    "author": {
      "name": "Tom Shafer",
      "url": "https://tshafer.com"
    },
    "items": [{
        "id":  "https://tshafer.com/blog/2023/11/override-rstudio-desktop-font",
        "url":  "https://tshafer.com/blog/2023/11/override-rstudio-desktop-font",
        "title": "Overriding RStudio Desktop's Font Picker",
        "content_html": "<p>I&rsquo;ve been doing a decent amount of R programming lately and I&rsquo;ve\nwanted to experiment with GitHub&rsquo;s new <a href=\"https://github.com/githubnext/monaspace\">Monaspace</a> type family.\nOnce installed, though, only the variable font\u2014<em>er</em>\u2014variants\nregister as fixed-width fonts on macOS. This is a problem because\nRStudio Desktop (at least as of version <code>2023.06.1+524</code>) <em>only\nallows users to select fixed-width fonts in the interface</em>.</p>\n<p>RStudio manages most preferences in flat JSON files these days,\nso I figured I could pick whatever font file I wanted in these\nJSON files. After some searching, though, RStudio does <em>not</em>\nstore font configuration in the usual places:\n<code>~/.local/share/rstudio/</code> or <code>~/.config/rstudio/</code>.</p>\n<p>Instead, the font setting is stored (on macOS Sonoma, at least)\nin <strong><code>~/Library/Application Support/RStudio/config.json</code></strong>. I\nswapped in <code>font[\"fixedWidthFont\"] = \"MonaspaceNeon-Regular\"</code>,\nand everything works after an RStudio UI reload.</p>",
        "date_published": "2023-11-18T13:16:00-05:00",
        "date_modified": "2023-11-18T14:07:00-05:00"
        },
{
        "id":  "https://tshafer.com/blog/2023/08/shuffle-timings-revisited",
        "url":  "https://tshafer.com/blog/2023/08/shuffle-timings-revisited",
        "title": "Shuffling Columns: Pandas is Competitive‽",
        "content_html": "<p><a href=\"/blog/2022/06/timing-data-table-ops\">Last year</a> I benchmarked a few ways of shuffling columns\nin a <strong>data.table</strong>, but <a href=\"https://www.linkedin.com/feed/update/urn:li:activity:7090783694558269440?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7090783694558269440%2C7090805617132531712%29&amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287090805617132531712%2Curn%3Ali%3Aactivity%3A7090783694558269440%29\">what about <strong>pandas</strong>?</a> I didn&rsquo;t\nknow, so let&rsquo;s revisit those tests and add a few more operations!\n<strong>pandas</strong> winds up being much more competitive than I expected.</p>\n<div style=\"text-align:center\">\n<img src=\"https://tshafer.com/blog/2023/08/updated-timings-800.png\" width=\"800\" srcset=\"https://tshafer.com/blog/2023/08/updated-timings-1600.png 2x\">\n</div>\n\n<p>First, <strong>dplyr</strong> is <em>by far</em> the slowest. Second, <strong>pandas</strong> is\n(more than) competitive with the R options. In the small-size\nregime (vector sizes up to about 1,000), the <strong>pandas</strong> option is\nsimilar to, but faster than, most of the slower R options, and\nthe <strong>numpy</strong>-backed solution is nearly as fast as base R\nassignment and <strong>data.table</strong>&rsquo;s in-place option. I expected\n<strong>pandas</strong> to be a lot slower.</p>\n<p>More surprising, in the large-vector regime both Python solutions\noutperform <em>all</em> R options, and the in-place Python option is\nmuch faster than everything else, starting with vector sizes of\nabout 10,000. I&rsquo;m not sure how representative this benchmark is,\nbut it&rsquo;s an interesting data point.</p>\n<p>More than most frameworks, <strong>pandas</strong> feels sensitive to the\n<em>way</em> we do something: calling <code>.apply()</code> isn&rsquo;t just a little\nslower than <code>.transform()</code>&mdash;it&rsquo;s miles slower. So the simple\ntransformations we&rsquo;re doing here are pretty easy to optimize;\nand <strong>numpy</strong>-backed operations should be fast anyway.</p>\n<p>There also might be systematic differences between R and Python\ntests: R functions are tested using <strong>microbenchmark</strong> and Python\ntests were run with <strong>timeit</strong>. New code is below.</p>\n<hr>\n<details>\n<summary><strong>New Python testing code</strong></summary>\n\n\n<div class=\"codehilite\"><pre><span></span><code><span class=\"kn\">from</span> <span class=\"nn\">timeit</span> <span class=\"kn\">import</span> <span class=\"n\">Timer</span>\n<span class=\"kn\">import</span> <span class=\"nn\">pandas</span> <span class=\"k\">as</span> <span class=\"nn\">pd</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">scramble_naive</span><span class=\"p\">(</span><span class=\"n\">df</span><span class=\"p\">:</span> <span class=\"n\">pd</span><span class=\"o\">.</span><span class=\"n\">DataFrame</span><span class=\"p\">,</span> <span class=\"n\">colname</span><span class=\"p\">:</span> <span class=\"nb\">str</span><span class=\"p\">)</span> <span class=\"o\">-&gt;</span> <span class=\"n\">pd</span><span class=\"o\">.</span><span class=\"n\">DataFrame</span><span class=\"p\">:</span>\n    <span class=\"n\">df</span><span class=\"p\">[</span><span class=\"n\">colname</span><span class=\"p\">]</span> <span class=\"o\">=</span> <span class=\"n\">df</span><span class=\"p\">[</span><span class=\"n\">colname</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">sample</span><span class=\"p\">(</span><span class=\"n\">frac</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"n\">ignore_index</span><span class=\"o\">=</span><span class=\"kc\">True</span><span class=\"p\">)</span>\n    <span class=\"k\">return</span> <span class=\"n\">df</span>\n\n<span class=\"n\">test_naive</span> <span class=\"o\">=</span> <span class=\"s2\">&quot;scramble_naive(df, colname=&#39;x&#39;)&quot;</span>\n\n<span class=\"n\">results_naive</span> <span class=\"o\">=</span> <span class=\"p\">{</span>\n    <span class=\"n\">n</span><span class=\"p\">:</span> <span class=\"n\">Timer</span><span class=\"p\">(</span><span class=\"n\">test_naive</span><span class=\"p\">,</span> <span class=\"n\">setup</span> <span class=\"o\">%</span> <span class=\"n\">n</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">repeat</span><span class=\"p\">(</span><span class=\"n\">repeat</span><span class=\"o\">=</span><span class=\"mi\">100</span><span class=\"p\">,</span> <span class=\"n\">number</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">)</span>\n    <span class=\"k\">for</span> <span class=\"n\">n</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"mi\">21</span><span class=\"p\">)</span>\n<span class=\"p\">}</span>\n</code></pre></div>\n\n\n\nThe <strong>timeit</strong> module reports its times in seconds\n(cf.\u00a0nanoseconds for <strong>microbenchmark</strong>), so we need\na conversion factor to make the times comparable.\n\n<hr>\n\n\n<div class=\"codehilite\"><pre><span></span><code><span class=\"kn\">import</span> <span class=\"nn\">numpy</span> <span class=\"k\">as</span> <span class=\"nn\">np</span>\n<span class=\"kn\">import</span> <span class=\"nn\">pandas</span> <span class=\"k\">as</span> <span class=\"nn\">pd</span>\n\n<span class=\"k\">def</span> <span class=\"nf\">scramble_inplace</span><span class=\"p\">(</span><span class=\"n\">df</span><span class=\"p\">:</span> <span class=\"n\">pd</span><span class=\"o\">.</span><span class=\"n\">DataFrame</span><span class=\"p\">,</span> <span class=\"n\">colname</span><span class=\"p\">:</span> <span class=\"nb\">str</span><span class=\"p\">)</span> <span class=\"o\">-&gt;</span> <span class=\"n\">pd</span><span class=\"o\">.</span><span class=\"n\">DataFrame</span><span class=\"p\">:</span>\n    <span class=\"n\">np</span><span class=\"o\">.</span><span class=\"n\">random</span><span class=\"o\">.</span><span class=\"n\">shuffle</span><span class=\"p\">(</span><span class=\"n\">df</span><span class=\"p\">[</span><span class=\"n\">colname</span><span class=\"p\">]</span><span class=\"o\">.</span><span class=\"n\">to_numpy</span><span class=\"p\">())</span>\n    <span class=\"k\">return</span> <span class=\"n\">df</span>\n\n<span class=\"n\">test_inplace</span> <span class=\"o\">=</span> <span class=\"s2\">&quot;scramble_naive(df, colname=&#39;x&#39;)&quot;</span>\n\n<span class=\"n\">results_inplace</span> <span class=\"o\">=</span> <span class=\"p\">{</span>\n    <span class=\"n\">n</span><span class=\"p\">:</span> <span class=\"n\">Timer</span><span class=\"p\">(</span><span class=\"n\">test_inplace</span><span class=\"p\">,</span> <span class=\"n\">setup</span> <span class=\"o\">%</span> <span class=\"n\">n</span><span class=\"p\">)</span><span class=\"o\">.</span><span class=\"n\">repeat</span><span class=\"p\">(</span><span class=\"n\">repeat</span><span class=\"o\">=</span><span class=\"mi\">100</span><span class=\"p\">,</span> <span class=\"n\">number</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">)</span>\n    <span class=\"k\">for</span> <span class=\"n\">n</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"mi\">21</span><span class=\"p\">)</span>\n<span class=\"p\">}</span>\n</code></pre></div>\n\n\n\n</details>\n\n<p>&nbsp;</p>\n\n<details>\n<summary><strong>New R testing code</strong></summary>\n\n\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">scramble_base</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">input_df</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"n\">input_df</span><span class=\"p\">[[</span><span class=\"n\">colname</span><span class=\"p\">]]</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">sample</span><span class=\"p\">(</span><span class=\"n\">input_df</span><span class=\"p\">[[</span><span class=\"n\">colname</span><span class=\"p\">]])</span>\n<span class=\"w\">  </span><span class=\"n\">input_df</span>\n<span class=\"p\">}</span>\n</code></pre></div>\n\n\n\n<hr>\n\n\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">library</span><span class=\"p\">(</span><span class=\"n\">dplyr</span><span class=\"p\">)</span>\n\n<span class=\"n\">scramble_dplyr</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">input_tbl</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"n\">input_tbl</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span><span class=\"w\"> </span><span class=\"nf\">mutate</span><span class=\"p\">({{</span><span class=\"n\">colname</span><span class=\"p\">}}</span><span class=\"w\"> </span><span class=\"o\">:=</span><span class=\"w\"> </span><span class=\"nf\">sample</span><span class=\"p\">(</span><span class=\"n\">.data</span><span class=\"p\">[[</span><span class=\"n\">colname</span><span class=\"p\">]]))</span>\n<span class=\"p\">}</span>\n</code></pre></div>\n\n\n\nThis <strong>dplyr</strong> example is probably penalized a bit because I'm\npassing column names as strings.\n\n</details>\n\n<hr>",
        "date_published": "2023-08-10T12:03:00-04:00",
        "date_modified": "2023-08-12T10:08:00-04:00"
        },
{
        "id":  "https://tshafer.com/blog/2023/07/posit-package-manager-linux",
        "url":  "https://tshafer.com/blog/2023/07/posit-package-manager-linux",
        "title": "Posit Package Manager for Linux R Binaries",
        "content_html": "<p>I&rsquo;ve been getting a lot of use recently from the <a href=\"https://packagemanager.posit.co/client/#/repos/2/packages/\">Posit (n\u00e9e\nRStudio) Package Manager</a> (PPM), because it offers freely\navailable R package binaries for quite a few <a href=\"https://docs.posit.co/rpm/requirements/#supported-linux-distributions\">Linux\ndistributions</a>&mdash;including common ones I tend to see in\nDocker containers (<a href=\"https://rocker-project.org/images/\">rocker</a>) and &lsquo;the cloud&rsquo; (<a href=\"https://aws.amazon.com/amazon-linux-2/\">Amazon Linux\n2</a>). Recent versions of <strong><a href=\"https://rstudio.github.io/renv/index.html\">renv</a></strong> seem to take advantage\nof these binaries, too.</p>\n<p>Binary packages can cut package install times by an order of\nmagnitude, since they come precompiled. Many popular packages,\nincluding <strong><a href=\"http://rdatatable.gitlab.io/data.table/\">data.table</a></strong> and <strong><a href=\"https://dplyr.tidyverse.org\">dplyr</a></strong>, are more or less\nR wrappers around C or C++ code at this point in an effort to\nmake things fast, and so installing/compiling a package like\n<strong>dplyr</strong> from source can take minutes. That&rsquo;s fine once or\ntwice, but it&rsquo;s not fine when I regularly rebuild containers or\nrun <a href=\"https://github.com/r-lib/actions/tree/v2-branch/check-r-package\">package checks with GitHub Actions</a>.</p>\n<p>It&rsquo;s pretty easy to get running with PPM once you know where to\nlook in the documentation, and it basically comes down to two\nsteps.</p>\n<h2 id=\"get-a-repo-url-for-your-linux-distribution\">Get a repo URL for your Linux distribution</h2>\n<p>Posit offers different endpoints for the various Linux\ndistributions. To find yours:</p>\n<ol>\n<li>\n<p>Navigate to <a href=\"https://packagemanager.posit.co/client/#/repos/2/overview/\">the PPM setup page</a>.</p>\n</li>\n<li>\n<p>Choose your Linux distribution. <em>At the right-hand side of the\n   header, click &ldquo;Source&rdquo; and choose from the drop-down menu.</em></p>\n</li>\n<li>\n<p>Make sure the &ldquo;Repository URL&rdquo; setting is &ldquo;Latest&rdquo;. <em>This is\n   the default.</em></p>\n</li>\n<li>\n<p>Copy the URL for your distribution. <em>E.g., the URL for Ubuntu\n   22.04 is <a href=\"https://packagemanager.posit.co/cran/__linux__/jammy/latest\">https://packagemanager.posit.co/cran/__linux__/jammy/latest</a>.</em></p>\n</li>\n</ol>\n<h2 id=\"configure-r-to-use-your-new-endpoint\">Configure R to use your new endpoint</h2>\n<!-- https://python-markdown.github.io/extensions/fenced_code_blocks/ -->\n<!-- Fenced blocks in lists aren't supported. -->\n\n<p>It&rsquo;s easy to configure R to use the PPM endpoint by setting or\nchanging a couple of values in <code>.Rprofile</code>:</p>\n<ol>\n<li>\n<p><a href=\"https://packagemanager.posit.co/__docs__/admin/serving-binaries/#binary-user-agents\">Set a header</a> to tell Posit your R configuration.\n   <em>The header passes along your version of R and a few other\n   platform details.</em></p>\n</li>\n<li>\n<p><a href=\"https://packagemanager.posit.co/__docs__/user/configure-r/\">Set the PPM URL as your CRAN source</a>. <em><code>CRAN</code> is\n   generally the default package source and is configured as part\n   of the &ldquo;repos&rdquo; option.</em></p>\n</li>\n</ol>\n<p>Put together, you get an addition to <code>.Rprofile</code> like this:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">options</span><span class=\"p\">(</span><span class=\"n\">HTTPUserAgent</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">sprintf</span><span class=\"p\">(</span>\n<span class=\"w\">  </span><span class=\"s\">&quot;R/%s R (%s)&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span>\n<span class=\"w\">  </span><span class=\"nf\">getRversion</span><span class=\"p\">(),</span><span class=\"w\"> </span>\n<span class=\"w\">  </span><span class=\"nf\">paste</span><span class=\"p\">(</span>\n<span class=\"w\">    </span><span class=\"nf\">getRversion</span><span class=\"p\">(),</span><span class=\"w\"> </span>\n<span class=\"w\">    </span><span class=\"n\">R.version</span><span class=\"p\">[</span><span class=\"s\">&quot;platform&quot;</span><span class=\"p\">],</span><span class=\"w\"> </span>\n<span class=\"w\">    </span><span class=\"n\">R.version</span><span class=\"p\">[</span><span class=\"s\">&quot;arch&quot;</span><span class=\"p\">],</span><span class=\"w\"> </span>\n<span class=\"w\">    </span><span class=\"n\">R.version</span><span class=\"p\">[</span><span class=\"s\">&quot;os&quot;</span><span class=\"p\">]</span>\n<span class=\"w\">  </span><span class=\"p\">)</span>\n<span class=\"p\">))</span>\n\n<span class=\"n\">.ppm</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"s\">&quot;https://packagemanager.posit.co/cran/__linux__/jammy/latest&quot;</span>\n<span class=\"nf\">options</span><span class=\"p\">(</span><span class=\"n\">repos</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"n\">CRAN</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">.ppm</span><span class=\"p\">))</span>\n</code></pre></div>\n\n<p>By setting these values in <code>.Rprofile</code> they will generally\npropagate to all your R sessions and, if the stuff I&rsquo;m doing is\nan indication (Docker, GitHub Actions, Databricks,<sup id=\"fnref:47-1\"><a class=\"footnote-ref\" href=\"#fn:47-1\">1</a></sup> etc.), you\nmight save a good bit of time.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:47-1\">\n<p>Ugh.&#160;<a class=\"footnote-backref\" href=\"#fnref:47-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2023-07-29T17:52:00-04:00",
        "date_modified": "2023-08-01T22:30:00-04:00"
        },
{
        "id":  "https://tshafer.com/blog/2023/07/generative-ai-reflections",
        "url":  "https://tshafer.com/blog/2023/07/generative-ai-reflections",
        "title": "A Few Points on Generative AI",
        "content_html": "<p>At <a href=\"https://www.elderresearch.com\">work</a>, I recently had the opportunity to spend time\nthinking about the rise of generative AI from the perspective of\nour clients: businesses who hear the hype and wonder how to sort\nthrough all of this stuff. What are the early benefits? What are\nsome of the obvious risks?</p>\n<p>Our marketing folks took some of that thinking and <a href=\"https://www.elderresearch.com/blog/reflections-on-generative-ai/\">turned it\ninto a blog post</a>. I&rsquo;m hardly an expert in generative AI,\nbut that&rsquo;s kind of the fun of it. These large language models\nwe&rsquo;ve been watching and working with over the last few years are,\nat least for now, opening up a bunch of new applications. Maybe\nthis fire will keep burning or maybe it&rsquo;ll cool, but it&rsquo;s an\ninteresting time.</p>",
        "date_published": "2023-07-21T22:45:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2023/03/more-apple-music-usb",
        "url":  "https://tshafer.com/blog/2023/03/more-apple-music-usb",
        "title": "More on Apple Music and USB Sticks",
        "content_html": "<p>I have <a href=\"/blog/2023/02/honda-usb-stick\">a script</a> that cleans up a USB stick for playing music\nin my car, but that&rsquo;s not quite enough. What I really need is a\nway to synchronize Apple Music with the volume, like you would do\nwith an iPhone\u2014or iPod, back in the day.</p>\n<p>I started by dragging tracks from the Music app to the USB drive\nicon, but the app doesn&rsquo;t offer a dialog asking whether we&rsquo;d like\nto overwrite existing files; instead, I wound up with multiple\ncopies of every file instead of even one-way synchronization. So\napparently I had to do this myself, and the result is a tiny\nlittle Python package named <a href=\"https://github.com/tomshafer/musicsync\">musicsync</a>.</p>\n<p>I tried a few different things, including rsync, but a\ncombination of AppleScript (to gather files from the Music app),\nPython (for comparing directory trees between selected songs and\nthe destination, and for handling the synchronization looping),\nand OS-level file operations works really nicely.</p>\n<div class=\"codehilite\"><pre><span></span><code>$<span class=\"w\"> </span>musicsync<span class=\"w\"> </span>--playlist<span class=\"w\"> </span><span class=\"s2\">&quot;Selected for Car&quot;</span><span class=\"w\"> </span>/Volumes/UNTITLED\n\nCollecting<span class=\"w\"> </span>songs<span class=\"w\"> </span>from<span class=\"w\"> </span>playlist<span class=\"w\"> </span><span class=\"s2\">&quot;Selected for Car&quot;</span>\n<span class=\"w\">  </span>Collected<span class=\"w\"> </span><span class=\"m\">2514</span><span class=\"w\"> </span>songs\nPlaylist<span class=\"w\"> </span>root<span class=\"w\"> </span>directory<span class=\"w\"> </span>is<span class=\"w\"> </span><span class=\"s2\">&quot;~/Music/Music/Media.localized/Music/&quot;</span>\nBuilding<span class=\"w\"> </span>playlist<span class=\"w\"> </span>song<span class=\"w\"> </span>tree\n<span class=\"w\">  </span>Found<span class=\"w\"> </span><span class=\"m\">365</span><span class=\"w\"> </span><span class=\"nb\">dirs</span><span class=\"w\"> </span>and<span class=\"w\"> </span><span class=\"m\">2514</span><span class=\"w\"> </span>files\nRemoving<span class=\"w\"> </span>extra<span class=\"w\"> </span>files<span class=\"w\"> </span>from<span class=\"w\"> </span><span class=\"s2\">&quot;/Volumes/UNTITLED/&quot;</span>\nCopying<span class=\"w\"> </span>new<span class=\"w\"> </span>files<span class=\"w\"> </span>from<span class=\"w\"> </span><span class=\"s2\">&quot;Selected for Car&quot;</span><span class=\"w\"> </span>to<span class=\"w\"> </span><span class=\"s2\">&quot;/Volumes/UNTITLED/&quot;</span>\n<span class=\"w\">  </span>Aaron<span class=\"w\"> </span>Keyes/In<span class=\"w\"> </span>the<span class=\"w\"> </span>Living<span class=\"w\"> </span>Room:<span class=\"w\">  </span><span class=\"m\">17</span>%<span class=\"p\">|</span>\u2588\u2588<span class=\"w\">          </span><span class=\"p\">|</span><span class=\"w\"> </span><span class=\"m\">2</span>/12<span class=\"w\"> </span><span class=\"o\">[</span><span class=\"m\">00</span>:06&lt;<span class=\"m\">00</span>:25,<span class=\"w\">  </span><span class=\"m\">2</span>.60s/it<span class=\"o\">]</span>\n</code></pre></div>\n\n<p>A couple of points I found interesting while building this:</p>\n<ul>\n<li>How you ask AppleScript for a listing <a href=\"https://stackoverflow.com/a/41861028\">really matters</a>. Do\n  it the naive way and you&rsquo;ll be waiting a while\u2014just to get a\n  list of about 2,500 files on an M1 MacBook Pro.</li>\n<li>Because my car needs a FAT32-formatted drive, rsync (and my\n  Python code) <a href=\"https://serverfault.com/a/55028\">can&rsquo;t count on file timestamps</a> to figure\n  out whether they&rsquo;ve been modified. The only reliable thing we\n  have is the file size itself.</li>\n<li>Copying files with <code>ditto --nocache --noextattr --noqtn\n  --norsrc</code> was great for working with FAT32 and ignoring\n  resource forks and other extended attributes.</li>\n<li>I found some <a href=\"https://stackoverflow.com/q/19503697\">real strange stuff</a> with how the FAT32\n  volume handles Unicode file names, which led to my script being\n  unable to find and synchronize directories with <a href=\"https://en.wikipedia.org/wiki/Diacritic\">diacritics</a>.\n  I hack around that by normalizing Unicode strings with\n  <a href=\"https://docs.python.org/3.10/library/unicodedata.html\">unicodedata</a>.</li>\n</ul>",
        "date_published": "2023-03-07T20:32:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2023/02/honda-usb-stick",
        "url":  "https://tshafer.com/blog/2023/02/honda-usb-stick",
        "title": "Making a USB Stick Work With My Honda",
        "content_html": "<p>Sometimes when I&rsquo;m running errands or whatever\nI just don&rsquo;t <em>want</em> to take my phone out.\nIt&rsquo;s nice to have at least a few minutes without it,\nand if I&rsquo;m headed to the grocery store\nit isn&rsquo;t like I actually <em>need</em> it to get there.\nSome times quiet and stillness is really nice,\nbut my car also has a couple of USB slots\nand purportedly supports MP3 and AAC audio.</p>\n<p>But it&rsquo;s 2023 and apparently there are still pieces of tech\nthat absolutely do not work out of the box with macOS\u2013related stuff.\nI&rsquo;m a reasonably technical person, but after all the Google searching\nand the approximately ten million times this didn&rsquo;t work,\nhere are the steps I needed to follow to play MP3s on my 2019 Accord\nfrom a USB stick plugged into a Mac laptop.\nThese directions basically mirror a set I found, eventually,\n<a href=\"https://www.ridgelineownersclub.com/threads/usb-music-mac-os-x.210353/post-3007304\">on a forum for Honda Ridgeline owners</a>.\nMaybe this post can amplify those directions\nand offer a little more Google-fu.</p>\n<ol>\n<li>Reformat the USB stick.\n   The Format must be &ldquo;MS-DOS (FAT32),&rdquo;\n   <strong>and</strong> the Scheme must be Master Boot Record.</li>\n<li>Copy your music over, dragging from Apple Music or whatever.</li>\n<li>Remove <strong>every single dot-directory</strong>. Every. Single. One.\n   <code>rm -r .Trashes .Spotlight-V100</code>, etc.\n   This requires the Terminal, iTerm 2, etc. to have Full Disk Access.</li>\n<li>Merge resource dotfiles into their associated songs.\n   The tool <code>/usr/sbin/dot_clean</code> is installed with macOS.</li>\n<li>(Maaaybe?) Reorganize the files so that there aren&rsquo;t\n   more than 250ish songs in a single directory.</li>\n</ol>\n<p>And then we probably need to do this every time we add new files to the drive.\nSigh.\nI wrote a little Python script to do this, in case it&rsquo;s helpful:\n<a href=\"https://github.com/tomshafer/cleanusb\">https://github.com/tomshafer/cleanusb</a>.\nOnce installed, it offers a command named <code>cleanusb</code> that follows these steps.</p>",
        "date_published": "2023-02-23T21:01:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2022/09/logging-connection-pid",
        "url":  "https://tshafer.com/blog/2022/09/logging-connection-pid",
        "title": "Log Connection PIDs During Exploration",
        "content_html": "<p>Here&rsquo;s a trick I find useful during exploratory analysis and feature engineering; really, whenever I&rsquo;m querying against database servers I don&rsquo;t control: Log the connection PID at query time.</p>\n<p>It happens pretty often to me during exploratory analysis that I launch a query and then, either right away or after the query begins to drag on longer than expected, wish I could cancel the query to edit it and try again.\nMaybe I forgot to put a filter into the query, and it&rsquo;s about to return way too much data, or maybe the database is underpowered and I&rsquo;d rather extract a smaller result for analysis.</p>\n<p>Often there isn&rsquo;t a good way to stop a query from within RStudio/VS Code when the query is directed to a remote database server.\nUnless we want to wait for the query to finish and return control of the IDE to us, maybe the best we can do is to restart the R session or Jupyter kernel and start another query.\nIf we log the connection PID, though, we get another option: We can open <em>another</em> session, instead of killing this one, and ask the database server to cancel any running queries on that PID.</p>\n<p>This is really easy to do, both in Python and R, and most drivers have a way to get the PID pretty easily.\nI&rsquo;ve been using Postgres/Redshift a lot recently, so I&rsquo;ll use it as an example.\nWith Python and <a href=\"https://pypi.org/project/psycopg2/\">psycopg2</a>, we can call <code>get_backend_pid()</code> to log the PID:</p>\n<div class=\"cell-code codehilite\"><pre><span></span><code><span class=\"n\">log</span><span class=\"o\">.</span><span class=\"n\">info</span><span class=\"p\">(</span><span class=\"sa\">f</span><span class=\"s2\">&quot;Query PID = </span><span class=\"si\">{</span><span class=\"n\">conn</span><span class=\"o\">.</span><span class=\"n\">get_backend_pid</span><span class=\"p\">()</span><span class=\"si\">}</span><span class=\"s2\">&quot;</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code>#&gt; Query PID = 948\n</code></pre></div>\n\n<p>Or, with R and <a href=\"https://cran.r-project.org/package=DBI\">DBI</a> we can call <code>dbGetInfo()</code>:</p>\n<div class=\"cell-code codehilite\"><pre><span></span><code><span class=\"nf\">message</span><span class=\"p\">(</span><span class=\"s\">&quot;Query PID = &quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">DBI</span><span class=\"o\">::</span><span class=\"nf\">dbGetInfo</span><span class=\"p\">(</span><span class=\"n\">conn</span><span class=\"p\">)</span><span class=\"o\">$</span><span class=\"n\">pid</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code>#&gt; Query PID = 950\n</code></pre></div>\n\n<p>Once we have our PID, it&rsquo;s really easy to open up a new session and ask the database to cancel the running query:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"n\">pg_cancel_backend</span><span class=\"p\">(</span><span class=\"o\">&lt;</span><span class=\"n\">PID</span><span class=\"o\">&gt;</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<p>Because this is just SQL, we can run this query anywhere and recover control of our IDE session.\nSimple and useful.</p>",
        "date_published": "2022-09-13T07:05:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/06/experimenting-with-quarto",
        "url":  "https://tshafer.com/blog/2022/06/experimenting-with-quarto",
        "title": "Experimenting with Quarto",
        "content_html": "<p><a href=\"https://quarto.org\">Quarto</a> is the up-and-coming &ldquo;next generation version of R Markdown&rdquo; being developed by <a href=\"https://www.rstudio.com\">RStudio</a>.\nIt&rsquo;s more or less a superset of R Markdown/<a href=\"https://yihui.org/knitr/\">knitr</a> that&rsquo;s suited to programming languages besides R.\nQuarto&rsquo;s <a href=\"https://github.com/quarto-dev/quarto-cli/milestone/1\">heading towards a 1.0</a>, and I&rsquo;ve started experimenting for a few client projects.</p>\n<p>So far I like the system <em>a lot</em>, and at this point I really think Quarto&rsquo;s worth a try;\nespecially since it&rsquo;s available with the recent versions of RStudio.</p>\n<p>This post lists a few of my favorite elements after a couple weeks&rsquo; using the tool off and on.</p>\n<h3 id=\"its-built-on-rmarkdown\">It&rsquo;s built on RMarkdown</h3>\n<p>Because Quarto uses knitr to execute R code, my usual workflows don&rsquo;t change unless I want them to.\nIt&rsquo;s just about all upside so far, \nand I&rsquo;ve been able to use the old-style <code>knitr::opts_chunk$set()</code> syntax in places I haven&rsquo;t been able to configure Quarto immediately.</p>\n<h3 id=\"yaml-metadata-for-documents-and-chunks\">YAML metadata for documents and chunks</h3>\n<p>Quarto takes the YAML metadata styling used by R Markdown (and <a href=\"https://pandoc.org\">pandoc</a> and many other tools) and extends it.\nIn particular, Quarto introduces some special syntax (<code>#| key: value</code>) to specify chunk-level options.\nThis is supposed to replace the old style, where the options are crammed into the language identifier (<code>{r title, fig.asp=0.618, ...}</code>),\nand it&rsquo;s particularly nice to specify a good number of options\u2014like I might when building a figure:</p>\n<div class=\"codehilite\"><pre><span></span><code>  ```{r}\n  #| label: my-figure\n  #| fig-asp: 0.618\n  #| fig-cap: |\n  #|   This is a caption for my figure, \n  #|   using YAML formatting, etc.\n\n  ggplot() + ...\n  ```\n</code></pre></div>\n\n<h3 id=\"bootstrap-layouts-and-styling\">Bootstrap layouts and styling</h3>\n<p>One of the nicest new features is the upgraded styling.\nFirst, the default theming feels much nicer than the old RMarkdown style (even if the default font size is a little big for my taste).\nSecond, though, the new styling makes margin notes easily possible and subfigures much easier to compose.</p>\n<p>Margin notes are a big deal and can be enabled by switching <code>reference-location: margin</code> in the YAML front matter.</p>\n<p>In the past I&rsquo;ve relied on <a href=\"https://r4ds.had.co.nz/graphics-for-communication.html#figure-sizing\">R4DS&rsquo;s figures guidance</a> for composing multiple figures in one chunk.\nBut now Quarto can do this composition immediately using layout options,\nand, with the additional <code>column:</code> options, we can tell Quarto to expand the content width as we like.\nE.g., for two subfigures stretching across the page and outside of the normal body content block:</p>\n<div class=\"codehilite\"><pre><span></span><code>  ```{r}\n  #| label: my-figure\n  #| layout-ncol: 2\n  #| column: page\n\n  ggplot() + ...\n  ggplot() + ...\n  ```\n</code></pre></div>\n\n<p>It&rsquo;s just so nice.</p>\n<h3 id=\"preview-mode\">Preview mode</h3>\n<p>Quarto can be downloaded or built into a standalone application, it isn&rsquo;t dependent on RStudio or any IDE.\nUsing the command line tool, we can call <code>quarto render</code> to compile a document <em>or</em> we can call <code>quarto preview</code> to render a live preview that automatically updates when the source files are saved. This is exposed in RStudio via &ldquo;Render on Save,&rdquo; but it&rsquo;s available in any editor using the command-line tool.</p>\n<h3 id=\"rstudio-and-vs-code-integrations\">RStudio and VS Code integrations</h3>\n<p>Finally, there are the IDE integrations.\nWhen I&rsquo;m working with R I&rsquo;m almost always using RStudio,\nand RStudio has the expected set of built-in niceties. \nMostly, these are all the usual knitr-powered switches, buttons, and keyboard shortcuts,\nbut there is also editor completion for the various YAML configurations at the cell and editor level.</p>\n<p>Just about all the work I do outside of R is handled with VS Code and, often, its <a href=\"https://code.visualstudio.com/docs/remote/ssh\">Remote SSH</a> and <a href=\"https://code.visualstudio.com/docs/remote/containers\">Remote Containers</a> extensions.\nI&rsquo;ve so far used Quarto less with VS Code than with RStudio,\nbut I&rsquo;ve already experimented with one nice feature:\nQuarto&rsquo;s ability to render VS Code <code>.ipynb</code> documents (for which VS Code has a very nice integrated experience) as Quarto documents.\nThis seems to work just like one would expect: Simply put the relevant YAML metadata in the top notebook cell and add <code>#| key: value</code> comments to blocks as necessary.\nVS Code renders and executes the notebook just as expected,\nand Quarto takes the notebook\u2014and any output that&rsquo;s been generated\u2014and renders it like I&rsquo;d expect if I&rsquo;d written the code in R rather than Python.</p>",
        "date_published": "2022-06-24T21:04:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/06/timing-data-table-ops",
        "url":  "https://tshafer.com/blog/2022/06/timing-data-table-ops",
        "title": "Timing data.table Operations",
        "content_html": "<p>In a post <a href=\"/blog/2022/06/shuffling-columns-data-table\">last week</a> I offered a couple of simple techniques\nfor randomly shuffle a <a href=\"https://r-datatable.com\">data.table</a> column in place and\nbenchmarked them as well. A <a href=\"https://statisticaloddsandends.wordpress.com/2022/06/04/a-quirk-when-using-data-table/#comment-4284\">comment on the original\nquestion</a>, though, argued these timings aren&rsquo;t useful\nsince the benchmarked data set only contains five rows (the size\nof the table in the original post).</p>\n<p>That seemed plausible, so I&rsquo;ve carried the test further. Often\nwe&rsquo;re interested in vectors with hundreds, thousands, or millions\nof elements, not a handful. Do the timings change as the vector\nsize grows?</p>\n<p>To find out, I simply extended my computation from last time\nusing <a href=\"https://github.com/joshuaulrich/microbenchmark/\">microbenchmark</a> and plotted the results below. I&rsquo;m\nsurprised to see just how much <code>set()</code> continues to outperform\nthe other options even to fairly large vector sizes.</p>\n<div style=\"text-align:center\">\n<img src=\"/blog/2022/06/data-table-timings-750.png\" style=\"max-width: 100%; max-height: 100%; height: auto\" srcset=\"/blog/2022/06/data-table-timings-1500.png 2x\">\n</div>\n\n<details style=\"margin-bottom: 1em; margin-top: 1em;\">\n\n<summary>Benchmark Code</summary>\n\n\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">library</span><span class=\"p\">(</span><span class=\"n\">data.table</span><span class=\"p\">)</span>\n<span class=\"nf\">library</span><span class=\"p\">(</span><span class=\"n\">microbenchmark</span><span class=\"p\">)</span>\n\n<span class=\"n\">scramble_orig</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"n\">new_col</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">sample</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">[[</span><span class=\"n\">colname</span><span class=\"p\">]])</span>\n<span class=\"w\">  </span><span class=\"n\">input_dt</span><span class=\"p\">[,</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">:=</span><span class=\"w\"> </span><span class=\"n\">new_col</span><span class=\"p\">]</span>\n<span class=\"p\">}</span>\n\n<span class=\"n\">scramble_set</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"nf\">set</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">j</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">value</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">sample</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">[[</span><span class=\"n\">colname</span><span class=\"p\">]]))</span>\n<span class=\"p\">}</span>\n\n<span class=\"n\">scramble_sd</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"n\">input_dt</span><span class=\"p\">[,</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">:=</span><span class=\"w\"> </span><span class=\"n\">.SD</span><span class=\"p\">[</span><span class=\"nf\">sample</span><span class=\"p\">(</span><span class=\"n\">.I</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">.N</span><span class=\"p\">)],</span><span class=\"w\"> </span><span class=\"n\">.SDcols</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">]</span>\n<span class=\"p\">}</span>\n\n<span class=\"n\">times</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">rbindlist</span><span class=\"p\">(</span>\n<span class=\"w\">  </span><span class=\"nf\">lapply</span><span class=\"p\">(</span>\n<span class=\"w\">    </span><span class=\"nf\">setNames</span><span class=\"p\">(</span><span class=\"n\">nm</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">2</span><span class=\"w\"> </span><span class=\"o\">**</span><span class=\"w\"> </span><span class=\"nf\">seq</span><span class=\"p\">(</span><span class=\"m\">0</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">20</span><span class=\"p\">)),</span>\n<span class=\"w\">    </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">n</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">      </span><span class=\"nf\">message</span><span class=\"p\">(</span><span class=\"s\">&quot;n = &quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">n</span><span class=\"p\">)</span>\n<span class=\"w\">      </span><span class=\"nf\">setDT</span><span class=\"p\">(</span><span class=\"nf\">microbenchmark</span><span class=\"p\">(</span>\n<span class=\"w\">        </span><span class=\"n\">orig</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">scramble_orig</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;x&quot;</span><span class=\"p\">),</span>\n<span class=\"w\">        </span><span class=\"n\">set</span><span class=\"w\">  </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">scramble_set</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;x&quot;</span><span class=\"p\">),</span>\n<span class=\"w\">        </span><span class=\"n\">sd</span><span class=\"w\">   </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">scramble_sd</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;x&quot;</span><span class=\"p\">),</span>\n<span class=\"w\">        </span><span class=\"n\">setup</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">          </span><span class=\"n\">input_dt</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">data.table</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">seq_len</span><span class=\"p\">(</span><span class=\"n\">n</span><span class=\"p\">))</span>\n<span class=\"w\">          </span><span class=\"nf\">set.seed</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"p\">)</span>\n<span class=\"w\">        </span><span class=\"p\">},</span>\n<span class=\"w\">        </span><span class=\"n\">check</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"s\">&quot;identical&quot;</span>\n<span class=\"w\">      </span><span class=\"p\">))</span>\n<span class=\"w\">    </span><span class=\"p\">}</span>\n<span class=\"w\">  </span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">idcol</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"s\">&quot;vector_size&quot;</span>\n<span class=\"p\">)</span>\n</code></pre></div>\n\n\n\n</details>\n\n<p>Reading the chart from left to right, small vectors to large\nones, the first regime is one where <code>set()</code> dominates the other\nmethods, having a much shorter runtime. This is followed by a\ntransition to a regime where the time required for <code>sample()</code> to\nshuffle large vectors dominates the run time. (Notice both axes\nare on the logarithmic scale, so the time is exponentially increasing.)</p>\n<p><strong>Does this matter?</strong> The differences here are so small that we\ncan&rsquo;t even use <a href=\"https://rstudio.github.io/profvis/\">profvis</a> to benchmark a single run. But, what\nif we were calling this functionality repeatedly in a loop? The\ndifferences add up.</p>\n<p>This is a good example of where it&rsquo;s nice to know the options\navailable to us in the languages and packages being used: The\ndata.table authors built <code>set()</code> for these kinds of reasons, as a\nway to programmatically assign to data.tables in place within\nloops.</p>\n<p>In a one-off analysis, maybe it&rsquo;s not worth the trouble to care\ntoo much about speed, and it&rsquo;s likely not a good use of time to\nbenchmark everything. But when writing packaged code, for\nexample, we give up the ability to know how and where our code\nwill be used. It pays to be aware of things like the difference\nbetween using <code>.SD</code> and <code>set()</code> and which is the better option.\nIt makes our code more easily used in places we&rsquo;d never thought\nabout and <em>can&rsquo;t</em> think about at the time.</p>",
        "date_published": "2022-06-11T17:52:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/06/shuffling-columns-data-table",
        "url":  "https://tshafer.com/blog/2022/06/shuffling-columns-data-table",
        "title": "Shuffling Columns With data.table",
        "content_html": "<p>Yesterday, in a post syndicated to <a href=\"https://www.r-bloggers.com\">R-bloggers</a>, <a href=\"https://statisticaloddsandends.wordpress.com/2022/06/04/a-quirk-when-using-data-table/\">kjytay\nasked</a> about how to programmatically shuffle a data.table\ncolumn in place, as the straightforward way didn&rsquo;t work well.</p>\n<p>Here are two other ways to solve the same problem, one using\n<code>data.table::set()</code> and the other <code>.SDcols</code>:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">scramble_set</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"nf\">set</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">j</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">value</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">sample</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">[[</span><span class=\"n\">colname</span><span class=\"p\">]]))</span>\n<span class=\"p\">}</span>\n\n<span class=\"n\">scramble_sd</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"n\">input_dt</span><span class=\"p\">[,</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"n\">colname</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">:=</span><span class=\"w\"> </span><span class=\"n\">.SD</span><span class=\"p\">[</span><span class=\"nf\">sample</span><span class=\"p\">(</span><span class=\"n\">.I</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">.N</span><span class=\"p\">)],</span><span class=\"w\"> </span><span class=\"n\">.SDcols</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">colname</span><span class=\"p\">]</span>\n<span class=\"p\">}</span>\n</code></pre></div>\n\n<p>Each approach returns the correct result and avoids the strange\ndispatch problem when trying to shuffle a column named &ldquo;colname&rdquo;.</p>\n<p>It&rsquo;s good to check performance with these kinds of things, too,\nespecially when <code>.SD</code> is involved, and <code>set()</code> handily\noutperforms the other two solutions (kjytay&rsquo;s original solution I\nnamed &ldquo;orig&rdquo;):</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">microbenchmark</span><span class=\"p\">(</span>\n<span class=\"w\">  </span><span class=\"n\">orig</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">scramble_orig</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;x&quot;</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">set</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">scramble_set</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;x&quot;</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">sd</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">scramble_sd</span><span class=\"p\">(</span><span class=\"n\">input_dt</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;x&quot;</span><span class=\"p\">),</span><span class=\"w\"> </span>\n<span class=\"w\">  </span><span class=\"n\">setup</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">    </span><span class=\"n\">input_dt</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">data.table</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1</span><span class=\"o\">:</span><span class=\"m\">5</span><span class=\"p\">)</span>\n<span class=\"w\">    </span><span class=\"nf\">set.seed</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"p\">)</span>\n<span class=\"w\">  </span><span class=\"p\">},</span><span class=\"w\"> </span>\n<span class=\"w\">  </span><span class=\"n\">check</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"s\">&quot;identical&quot;</span>\n<span class=\"p\">)</span>\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code>Unit: microseconds\n expr     min       lq      mean  median       uq      max neval\n orig 291.970 315.4400 351.52132 319.474 327.5635 3248.663   100\n  set  33.196  36.0965  61.62936  37.262  39.5380 2419.880   100\n   sd 557.834 591.2370 636.88657 597.579 616.2675 3821.737   100\n</code></pre></div>",
        "date_published": "2022-06-04T10:54:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/05/elder-research-chief-scientist-committee",
        "url":  "https://tshafer.com/blog/2022/05/elder-research-chief-scientist-committee",
        "title": "I Wrote a Thing",
        "content_html": "<p>I wrote a thing! Well, I edited someone else&rsquo;s thing, and then I\nadded a lot of words, and then someone else (multiple someones?)\nedited <em>my</em> words. And then they added fancy graphics and stuff.</p>\n<p>But here&rsquo;s a very business-y blog post that talks about our\ncompany&rsquo;s Chief Scientist Committee, of which I&rsquo;m a member: <a href=\"https://www.elderresearch.com/blog/why-does-elder-research-need-a-chief-scientist-committee/\">Why\nDoes Elder Research Need a Chief Scientist Committee?</a></p>",
        "date_published": "2022-05-02T21:57:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/04/vscode-devcontainer-uid-gid",
        "url":  "https://tshafer.com/blog/2022/04/vscode-devcontainer-uid-gid",
        "title": "Update VS Code Remote Container UID and GID",
        "content_html": "<p>I&rsquo;m increasingly relying on VS Code&rsquo;s <a href=\"https://code.visualstudio.com/docs/remote/containers\">Remote Container</a>\nfeatures for remote development in clients&rsquo; cloud computing\nsystems. It&rsquo;s a little fiddly (I wouldn&rsquo;t say I&rsquo;m a Docker\nexpert, either) but it mostly works out of the box, and the\nability to encapsulate my environment makes a lot of other things\neasier.</p>\n<p>I ran into a new problem recently, though, on a remote compute\ninstance with multiple Unix user accounts and VS Code&rsquo;s default\nPython 3 image. VS Code&rsquo;s containers are set up with a non-root\nuser named <em>vscode</em>, linked to the user and group IDs 1000:1000.\nMostly that&rsquo;s fine, but this time (because I&rsquo;d set up a few user\naccounts) my UID was, e.g., 1005, not 1000, and my primary GID\nwas totally different. The container needs to get this wiring\nright for permissions to be consistent inside and outside of the\ncontainer.</p>\n<p>It wasn&rsquo;t super clear to me initially, but the answer seems to be\nmanually updating the container&rsquo;s user (<em>vscode</em>) directly in the\ncontainer&rsquo;s Dockerfile. Lifting directives from <a href=\"https://code.visualstudio.com/remote/advancedcontainers/add-nonroot-user#_change-the-uidgid-of-an-existing-container-user\">&ldquo;Change the\nUID/GID of an existing container user&rdquo;</a> works like a charm,\nand I also appended another group:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"k\">ARG</span><span class=\"w\"> </span><span class=\"nv\">USERNAME</span><span class=\"o\">=</span>vscode\n<span class=\"k\">ARG</span><span class=\"w\"> </span><span class=\"nv\">USER_UID</span><span class=\"o\">=</span><span class=\"m\">1005</span>\n<span class=\"k\">ARG</span><span class=\"w\"> </span><span class=\"nv\">USER_GID</span><span class=\"o\">=</span><span class=\"m\">10</span>\n\n<span class=\"k\">RUN</span><span class=\"w\"> </span>groupmod<span class=\"w\"> </span>--gid<span class=\"w\"> </span><span class=\"nv\">$USER_GID</span><span class=\"w\"> </span><span class=\"nv\">$USERNAME</span><span class=\"w\"> </span><span class=\"se\">\\</span>\n<span class=\"w\">    </span><span class=\"o\">&amp;&amp;</span><span class=\"w\"> </span>usermod<span class=\"w\"> </span>--uid<span class=\"w\"> </span><span class=\"nv\">$USER_UID</span><span class=\"w\"> </span>--gid<span class=\"w\"> </span><span class=\"nv\">$USER_GID</span><span class=\"w\"> </span><span class=\"nv\">$USERNAME</span><span class=\"w\"> </span><span class=\"se\">\\</span>\n<span class=\"w\">    </span><span class=\"o\">&amp;&amp;</span><span class=\"w\"> </span>usermod<span class=\"w\"> </span>-aG<span class=\"w\"> </span><span class=\"nv\">$USER_UID</span><span class=\"w\"> </span><span class=\"nv\">$USERNAME</span><span class=\"w\"> </span><span class=\"se\">\\</span>\n<span class=\"w\">    </span><span class=\"o\">&amp;&amp;</span><span class=\"w\"> </span>chown<span class=\"w\"> </span>-R<span class=\"w\"> </span><span class=\"nv\">$USER_UID</span>:<span class=\"nv\">$USER_GID</span><span class=\"w\"> </span>/home/<span class=\"nv\">$USERNAME</span>\n\n<span class=\"k\">USER</span><span class=\"w\"> </span><span class=\"s\">$USERNAME</span>\n</code></pre></div>\n\n<p>The only thing left to do, I think, is figure out how automate\nthis for the others on my team so that at creation time the\ncontainer picks up <code>id -u</code> and <code>id -g</code> and populates the relevant\nfields automatically.</p>",
        "date_published": "2022-04-29T06:54:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/04/easter-walnut-creek-2022",
        "url":  "https://tshafer.com/blog/2022/04/easter-walnut-creek-2022",
        "title": "Easter 2022",
        "content_html": "<p>I&rsquo;ve been playing music since around sixth grade, mostly with\nschool and church groups and bands, and I&rsquo;ve had amazing\nopportunities to play fun venues, but last week took the cake. I\nhad the opportunity to play with <a href=\"https://summitchurch.com\">our church</a> at the <a href=\"http://walnutcreekamphitheatre.com\">Walnut\nCreek Amphitheatre</a> in Raleigh and it was legendary: nearly\nthree hours long, 16,000 in attendance, and 200 baptized.</p>\n<p>God&rsquo;s doing some amazing stuff in us, our friends, and in\nRaleigh, and what a moment this was to remember. Many notes were\nplayed and the gospel was preached. I&rsquo;ve included the stream\nbelow\u2014check it out!</p>\n<div style=\"text-align: center; width: 100%;\">\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/ZwXzAOZm2kk?start=2174\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen></iframe>\n</div>",
        "date_published": "2022-04-21T20:30:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/04/speaking-at-paw-healthcare-2022",
        "url":  "https://tshafer.com/blog/2022/04/speaking-at-paw-healthcare-2022",
        "title": "Presenting at PAW Healthcare on June 21",
        "content_html": "<p>For <a href=\"https://www.elderresearch.com/\">work</a>, I&rsquo;ll be in Las Vegas in June to present on Bayesian\nmodeling and workflow to <a href=\"https://www.predictiveanalyticsworld.com/#paw-healthcare\">Predictive Analytics World</a>&rsquo;s\nHealthcare track. Over the last year I&rsquo;ve worked on a pretty\nsimple problem that nicely illustrates the need for\u2014and\napplication of\u2014a modeling workflow along the lines of\n<a href=\"https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html\">Betancourt</a>, <a href=\"https://arxiv.org/abs/2011.01808\">Gelman</a>, and others, especially to guard\nagainst overly complex models.</p>\n<p>I&rsquo;m supposed to present for 45 minutes on June 21.</p>",
        "date_published": "2022-04-09T09:17:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/03/programming-with-lapply",
        "url":  "https://tshafer.com/blog/2022/03/programming-with-lapply",
        "title": "Programming with lapply",
        "content_html": "<p><a href=\"https://dictionary.cambridge.org/us/dictionary/english/til\">TIL</a> that <code>lapply</code> accepts both functions and function names (as\ncharacter vectors). From right there in the <a href=\"https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/lapply\">documentation</a>\n(emph. mine):</p>\n<blockquote>\n<p><code>FUN</code> is found by a call to <a href=\"https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/match.fun\"><code>match.fun</code></a> and typically is\nspecified as a function or a symbol (e.g., a backquoted name)\n<strong>or a character string specifying a function to be searched for\nfrom the environment of the call to <code>lapply</code>.</strong></p>\n</blockquote>\n<p>So, <code>lapply</code> can use <code>match.fun</code> to find and apply our functions\ndirectly; no need to hack around when we need custom function\napplication. That&rsquo;s way simpler in cases like we&rsquo;ve just had\nrecently at work, where we need to apply one of a variety of\nfunctions depending on the program&rsquo;s state.</p>",
        "date_published": "2022-03-31T21:57:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2022/01/favorite-books-2021",
        "url":  "https://tshafer.com/blog/2022/01/favorite-books-2021",
        "title": "My Favorite Books from 2021",
        "content_html": "<ul>\n<li><em><a href=\"https://www.goodreads.com/book/show/43982455-the-ruthless-elimination-of-hurry\">The Ruthless Elimination of Hurry</a></em> (2019), \n  by John Mark Comer</li>\n<li><em><a href=\"https://www.goodreads.com/book/show/6477103-trees-maps-and-theorems\">Trees, maps, and theorems</a></em> (2009),\n  by Jean-luc Doumont</li>\n<li><em><a href=\"https://www.goodreads.com/book/show/49247043-this-is-how-they-tell-me-the-world-ends\">This Is How They Tell Me The World Ends</a></em> (2021),\n  by Nicole Perlroth</li>\n<li><em><a href=\"https://www.goodreads.com/book/show/22816087-seveneves\">Seveneves</a></em> (2015), \n  by Neal Stephenson</li>\n<li><em><a href=\"https://www.goodreads.com/book/show/118053.Cryptonomicon\">Cryptonomicon</a></em> (1999), \n  by Neal Stephenson</li>\n<li><em><a href=\"https://www.goodreads.com/book/show/6594462-amusing-ourselves-to-death\">Amusing Ourselves to Death</a></em> (1985), \n  by Neil Postman</li>\n</ul>\n<p>Bonus picks:</p>\n<ul>\n<li><em><a href=\"https://www.goodreads.com/book/show/27246485-never-split-the-difference\">Never Split the Difference</a></em> (Audiobook, 2016),\n  by Chris Voss and Tahl Raz, narrated by Michael Kramer</li>\n<li><em><a href=\"https://www.goodreads.com/book/show/25735495-oh-crap-potty-training\">Oh Crap! Potty Training</a></em> (2015),\n  by Jamie Glowacki</li>\n</ul>",
        "date_published": "2022-01-27T20:30:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2021/10/macos-photos-untagged-people",
        "url":  "https://tshafer.com/blog/2021/10/macos-photos-untagged-people",
        "title": "Finding Unnamed Faces in Photos for macOS",
        "content_html": "<p>To this point, our family photo collection has grown to a little\nless than 28,000 photos and videos: not totally unwieldy, but\nlarge enough that tagging and facial recognition are important\ntools for finding specific photos or groups of photos. Facial\nrecognition also seems to drive a lot of Apple&rsquo;s &ldquo;Memories&rdquo;\nfeatures, which seem to have improved a lot in iOS 15.</p>\n<p>Our library is also big enough that it contains plenty of\nunlabeled &ldquo;People&rdquo;\u2014folks Photos can detect reliably enough to\nlabel, but with enough uncertainty that they aren&rsquo;t given a\npositive identification. Sorting through this sub-collection ends\nup being really useful because about 10% of our total library\nfalls into this category, where faces are detected but not\nidentified.</p>\n<p>Before, I used to scroll through sections of our library and\nlabel photos as I found them, but it turns out that it&rsquo;s possible\nto create an album that automatically collects all of these\ntogether: Just create a Smart Album where the &ldquo;Person&rdquo; is set to\nthe empty string:<sup id=\"fnref:41-1\"><a class=\"footnote-ref\" href=\"#fn:41-1\">1</a></sup></p>\n<div style=\"text-align:center\">\n<img src=\"/blog/2021/10/macos-photos-untagged-people__files/unnamed-people.png\" style=\"max-width: 100%; max-height: 100%; height: auto\" srcset=\"/blog/2021/10/macos-photos-untagged-people__files/unnamed-people@2x.png 2x\">\n</div>\n\n<p>Because this approach relies on Smart Albums it&rsquo;s only available\non the Mac, but it makes library tagging much easier\u2014especially\nwhen combined with another hard-to-discover Photos feature. I\nhave thousands of photos with unlabeled people, but many of these\nphotos are correlated: the pictures of the same person, sometimes\nin the same setting. Photos doesn&rsquo;t make it easy to determine\nwhen this is the case, but, in some cases, you can double-click\non a detected-but-not-identified face in the Info panel (\u2318+I) and\nview a page collecting all the times this person appears. By\nrenaming this unnamed individual, all of their photos are then\nmerged into the correct identity. Super useful.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:41-1\">\n<p>I&rsquo;ve since found <a href=\"https://itectec.com/askdifferent/mac-how-to-show-only-unnamed-faces-in-mac-photos/\">this post</a>, which also suggests\nadding a SQL-like wildcard, but I haven&rsquo;t found that to be\nnecessary on macOS Big Sur with Photos Version 6.0 (361.0.100).&#160;<a class=\"footnote-backref\" href=\"#fnref:41-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2021-10-03T06:46:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2021/01/pulsesecure-parallels",
        "url":  "https://tshafer.com/blog/2021/01/pulsesecure-parallels",
        "title": "Pulse Secure, Parallels, and the Windows Ctrl Key",
        "content_html": "<p>I just moved from a Windows 7 virtual machine to Windows 10 at work, but, as\nsoon as I moved over to the new VM, the Control key stopped registering in Pulse\nSecure remote desktop sessions. I&rsquo;m working in RStudio, so <code>Ctrl+Enter</code> and\n<code>Ctrl+Shift+Enter</code> are massively important key commands.</p>\n<p>After a lot of searching around, I found that this is a Parallels issue, not a\nWindows 10 or Pulse Secure bug. For one reason or another, instructing Parallels\nto optimize its keyboard handling for games fixes the issue. (See <a href=\"https://forum.parallels.com/threads/ctrl-shift-not-working-in-rdp.345927/\">this forum\npost</a> and <a href=\"https://kb.parallels.com/113932\">KB article</a>.)</p>\n<p>Maybe this post will save someone else an hour of Googling.</p>",
        "date_published": "2021-01-18T06:31:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2020/10/russell-wilson-mvp",
        "url":  "https://tshafer.com/blog/2020/10/russell-wilson-mvp",
        "title": "Russell Wilson and MVP Voting",
        "content_html": "<p>I was listening to one of <a href=\"https://www.theringer.com/the-bill-simmons-podcast\">Bill Simmons&rsquo;s podcasts</a> recently while he was talking about Russell Wilson, a superb NFL quarterback who has apparently has never received an MVP vote despite performing excellently for many years. In the discussion, it was decided that arguing for Wilson receiving vote(s) at some point in the past is pointless because <em>what should someone have done, voted for the wrong guy?</em> Surely not!</p>\n<p>That idea stuck with me for a while, and finally it occurred to me that we probably shouldn&rsquo;t be asking whether Wilson ought to have &lsquo;taken&rsquo; votes from an MVP winner. Most years don&rsquo;t have unanimous winners, so looking over the list of non-MVP players receiving votes is a better measure. I did some &lsquo;research&rsquo; (OK, I googled) and came up with this figure, which tabulates NFL MVP voting over the past five seasons:</p>\n<div style=\"text-align: center\"><img src=\"/blog/2020/10/russell-wilson-mvp-files/mvp-votes.png\" srcset=\"/blog/2020/10/russell-wilson-mvp-files/mvp-votes@2x.png 2x\" style=\"max-width: 100%\"></div>\n\n<p>Ear bar segment counts the number of MVP votes a player received in a year. I&rsquo;ve highlighted in blue the votes received by players who did not win MVP and I&rsquo;ve broken those numbers out into the following table:</p>\n<style type=\"text/css\">\ntable {\n    margin: 1.5em auto;\n}\nth, td {\n    text-align: right;\n    padding: 0.05rem 1rem;\n    margin: 0;\n}\nth.l, td.l {\n    text-align: left;\n}\nth {\n    border-bottom: 2px solid;\n}\n</style>\n\n<table>\n<tr class=\"score\">\n  <th class=\"l\">Player</th>\n  <th><span style=\"font-weight: normal; color: hsl(0, 0%, 75%)\">\u2193</span> Votes</th>\n  <th>Years</th>\n  <th></th>\n</tr>\n<tr><td class=\"l\">J.J. Watt </td><td>13</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Tom Brady </td><td>12</td><td>3</td><td><i>Won NFL MVP in 2017</i></td></tr>\n<tr><td class=\"l\">Drew Brees</td><td>9</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Todd Gurley</td><td>8</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Derek Carr</td><td>6</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Ezekiel Elliott</td><td>6</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">DeMarco Murray</td><td>2</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Aaron Rodgers</td><td>2</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Tony Romo</td><td>2</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Carson Wentz</td><td>2</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Carson Palmer</td><td>1</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Dak Prescott</td><td>1</td><td>1</td><td></td></tr>\n<tr><td class=\"l\">Bobby Wagner</td><td>1</td><td>1</td><td></td></tr>\n</table>\n\n<p>If the NFL MVP voting isn&rsquo;t typically unanimous, then, instead of asking whether Russell should have gotten votes destined for Patrick Mahomes or Lamar Jackson, maybe instead we could ask whether he should have received as many votes as, say, Carson Palmer. Yikes.</p>\n<p><em>The data and code used to write this post are available <a href=\"https://github.com/tomshafer/nfl-mvp\">on GitHub</a>.</em></p>",
        "date_published": "2020-10-03T09:48:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2020/09/data-science-covid19",
        "url":  "https://tshafer.com/blog/2020/09/data-science-covid19",
        "title": "Some Recent COVID-19 Work",
        "content_html": "<p>We don&rsquo;t usually talk much about what we&rsquo;re up to at <a href=\"https://www.elderresearch.com\">work</a>,\nbut in the last few weeks I&rsquo;ve had the opportunity to share some\nresearch work from earlier this year. Back in June, I teamed with\na group to produce an analysis of how (or whether) U.S.\ngovernment policy had affected COVID-19 infection rates. We&rsquo;ve\nwritten up the work in a <a href=\"https://www.elderresearch.com/blog/policy-impact-on-covid-19-spread\">post on the Elder Research\nblog</a>, and I <a href=\"https://www.youtube.com/watch?v=3E3AR0m2-6Q&amp;list=PL41aTiRjzS9sG2wtz6qusI4ZD2CDMV04O&amp;index=38\">presented</a> the same at the <a href=\"https://sites.google.com/view/dscc-19/tom-shafer\">Data\nScience Conference on COVID-19</a> (DSCC19).</p>\n<p>I&rsquo;ll defer most details to the linked post, but we analyzed\nmonths of data at the U.S. county level to test for policy\nimpacts on the growth of COVID-19 cases. We also <a href=\"https://twitter.com/statmodeling/status/1088807828863762432\">adjusted\nfor</a> many other potential explanatory inputs including\ntesting, population size and density, and key demographic factors\nincluding income and minority representation. Even after these\nadjustments, stay-at-home orders were associated with a slight\ndecrease in cases on average and were linked to a continuing\nreduction in cases over time (i.e., these orders seemed to\ndecelerate the disease progression).</p>\n<p>We&rsquo;ve published the source code for both the <a href=\"https://github.com/ElderResearch/emer2gent-covid19\">original\nproject</a> and <a href=\"https://gitlab.com/ElderResearch/pres-pub/blogs/policy-impact-on-covid-19-spread\">our article</a>. Having code\navailable made the project a good fit for DSCC19. The article\nsource, particularly, contains data and an R Markdown document\nthat, when compiled, should produce the same model fits we show\nin the article. </p>",
        "date_published": "2020-09-26T13:15:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2020/08/influence-maximization",
        "url":  "https://tshafer.com/blog/2020/08/influence-maximization",
        "title": "New Influence Maximization Article",
        "content_html": "<p>Last week, <a href=\"https://www.springer.com/journal/13278\">Social Network Analysis and Mining</a> published a <a href=\"https://rdcu.be/b6q4L\">new research article</a> that I coauthored with a few colleagues\u2014my first published <a href=\"/work/data-science/\">data science</a> paper. We did the work itself quite a while ago as part of a high-performance computing project, and my gratitude goes to <a href=\"https://hautahi.com\">Hautahi</a> for steering the paper to completion.</p>\n<p>In the article we studied the problem of Influence Maximization, a subfield of social network analytics and <a href=\"https://en.wikipedia.org/wiki/Network_science\">network science</a> that searches for ways of identifying the most influential entities in a network (think Instagram &lsquo;influencers&rsquo; \ud83d\ude44). This kind of problem turns out to be a computational nightmare, and so most approaches are either heuristic\u2014seeming to work well in practice but not mathematically guaranteed to return good solutions\u2014or are only proven to provide OK solutions. This second group of so-called &ldquo;provable&rdquo; algorithms typically guarantee solutions within <script type=\"math/tex\">1 - e^{-1} \\approx 63</script>% of the best possible answer. It&rsquo;s still possible that they give a perfect solution in a given case, but it&rsquo;s only assured that their solutions will be close to optimal.</p>\n<p>In the paper we took a really fast algorithm, tweaked it to provide brute-force calculations of the exact<sup id=\"fnref:22-1\"><a class=\"footnote-ref\" href=\"#fn:22-1\">1</a></sup> solution, and implemented it using HPC and GPUs. Then we compared various approximate approaches to the exact solution: do they only give 63%-optimal solutions, or do they perform better in practice? It turns out they perform extremely well on our test networks, representing common cases. There are likely pathological networks out there, but the techniques perform well with common graph constructions.</p>\n<p>The paper is <a href=\"https://rdcu.be/b6q4L\">available through SNAM</a>, and Hautahi has provided <a href=\"https://github.com/hautahi/IM-Evaluation\">source code</a>, a <a href=\"https://s3.amazonaws.com/docs.hautahi.com/MAISoN_IM_Slides.pdf\">workshop presentation</a>, and <a href=\"https://hautahi.com/static/docs/IM_Accuracy_Paper.pdf\">a preprint</a> through his website.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:22-1\">\n<p>As exact as you can get using a Monte Carlo approach.&#160;<a class=\"footnote-backref\" href=\"#fnref:22-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2020-08-30T12:02:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2020/08/r-packages-s3-methods",
        "url":  "https://tshafer.com/blog/2020/08/r-packages-s3-methods",
        "title": "Unexported S3 Methods and R Packages",
        "content_html": "<p>I&rsquo;ve recently spent some time at <a href=\"https://www.elderresearch.com\">work</a> updating a few R packages we&rsquo;ve built and deployed over the last several years, and during these updates I&rsquo;ve run up against an old foe:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">my_package</span><span class=\"o\">::</span><span class=\"nf\">run_model</span><span class=\"p\">(</span><span class=\"kc\">...</span><span class=\"p\">)</span>\n<span class=\"c1\">#&gt; Error in UseMethod(&quot;predict&quot;) : </span>\n<span class=\"c1\">#&gt;   no applicable method for &#39;predict&#39; applied to an object of class &quot;ranger&quot;</span>\n</code></pre></div>\n\n<p>In case you&rsquo;re unfamiliar, this error stems from R&rsquo;s <a href=\"https://adv-r.hadley.nz/s3.html\">S3 method-dispatch system</a>: <code>model</code> is a &ldquo;ranger&rdquo; object, so R goes looking for a &ldquo;ranger&rdquo; version of the &lsquo;predict&rsquo; method, but can&rsquo;t find it\u2014even though I have the <em>ranger</em> package installed and <code>Import</code>ed. I&rsquo;ve seen this occasionally ever since I stated working with R in 2016, but until now I&rsquo;ve treated the symptoms. I usually patch around this error by forcing R to use the (private) function <code>ranger:::predict.ranger()</code> even though this is poor form, shouldn&rsquo;t need to be done, and raises <code>NOTE</code>s during the <code>R CMD check</code> process.</p>\n<p>I recently made some progress, though, by realizing that things &lsquo;magically&rsquo; worked if I called, e.g., <code>library(ranger)</code> at the top of whatever script uses my package to get things done. And then, finally, I realized something else. One embedded call would work A-OK:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">result</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"n\">ranger</span><span class=\"o\">::</span><span class=\"nf\">predictions</span><span class=\"p\">(</span><span class=\"nf\">predict</span><span class=\"p\">(</span><span class=\"n\">model</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">data</span><span class=\"p\">))</span>\n</code></pre></div>\n\n<p>while a separate call in another function would not:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">result</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">predict</span><span class=\"p\">(</span><span class=\"n\">model</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">data</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<p>These two clues unlocked the solution.<sup id=\"fnref:23-1\"><a class=\"footnote-ref\" href=\"#fn:23-1\">1</a></sup> It turns out that, while <code>Import</code>ing a package is necessary, it is <em>not</em> sufficient in order for its S3 methods to be made available during runtime. This includes methods like <code>predict.&lt;class&gt;()</code>. In fact, S3 methods aren&rsquo;t registered <em>at all</em> unless you tell R to use some bit of the imported package earlier in your program.<sup id=\"fnref:23-2\"><a class=\"footnote-ref\" href=\"#fn:23-2\">2</a></sup></p>\n<p>You can test this by calling <code>methods(predict)</code>, which, in my example, will not list <code>predict.ranger()</code>. After calling <em>any <em>ranger</em> function</em>, however\u2014say, <code>ranger::predictions()</code>\u2014a subsequent call to <code>methods(predict)</code> does indeed list <code>predict.ranger()</code> as an available S3 method. This is why the &lsquo;predict&rsquo; call wrapped in <code>ranger::predictions</code> worked for years, and I didn&rsquo;t even notice; the <code>::</code> call <a href=\"http://r-pkgs.had.co.nz/namespace.html#search-path\">causes R to immediately load <code>ranger</code></a> along with its various S3 methods, so <code>predict()</code> dispatches just fine.</p>\n<p>If the S3 method were exported from a package, I suppose one could simply <a href=\"https://stackoverflow.com/a/30782653\">import the <code>predict.&lt;class&gt;()</code> method directly</a>, e.g., in <em>roxygen2</em> documentation syntax:</p>\n<div class=\"codehilite\"><pre><span></span><code>#&#39; @importFrom &lt;package name&gt; &lt;method name&gt;\n</code></pre></div>\n\n<p>But this doesn&rsquo;t work if the S3 method isn&rsquo;t exported\u2014and a &lsquo;predict&rsquo; method shouldn&rsquo;t really need to be.</p>\n<p>With the problem better understood, my solution is to do one of two things:</p>\n<ol>\n<li>Call, or import, another function from the package (e.g., \n    <code>ranger::predictions()</code>). This alone should cause the package to attach its \n    namespace and register S3 methods when my package is loaded.</li>\n<li>Add a single <code>requireNamespace(&lt;package name&gt;, quietly = TRUE)</code> call to the \n    top of the function of interest, or to my package&rsquo;s <code>.onLoad()</code> function.\n    Unlike <code>library()</code>, this causes R to register the appropriate S3 methods, etc., but prevents the package from &ldquo;attaching&rdquo;, from adding itself to the search path so that all its functions are globally available.<sup id=\"fnref:23-3\"><a class=\"footnote-ref\" href=\"#fn:23-3\">3</a></sup> You can confirm this again by checking <code>methods(predict)</code> before and after calling &lsquo;requireNamespace&rsquo;, including for non-exported S3 methods like <code>predict.ranger()</code>.</li>\n</ol>\n<p>Live and learn, I guess. \ud83e\udd37\ud83c\udffb\u200d\u2642\ufe0f</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:23-1\">\n<p>I&rsquo;ve also written up an answer <a href=\"https://stackoverflow.com/a/63105520\">on StackOverflow</a>.&#160;<a class=\"footnote-backref\" href=\"#fnref:23-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n<li id=\"fn:23-2\">\n<p>There&rsquo;s probably a good reason for this, but I cant&rsquo;t say that I like it.&#160;<a class=\"footnote-backref\" href=\"#fnref:23-2\" title=\"Jump back to footnote 2 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n<li id=\"fn:23-3\">\n<p>Note that once the S3 methods are registered there seems to be \n  <a href=\"https://stackoverflow.com/a/55798847\">no good way to deregister them</a>.&#160;<a class=\"footnote-backref\" href=\"#fnref:23-3\" title=\"Jump back to footnote 3 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2020-08-13T21:22:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2020/06/detectron2-eval-loss",
        "url":  "https://tshafer.com/blog/2020/06/detectron2-eval-loss",
        "title": "Multi-GPU Evaluation Loss with Detectron 2",
        "content_html": "<hr>\n<p><strong>Update (October 4, 2021):</strong> This trick seemed to work at the\ntime, but, when I returned to this work, multi-GPU training began\nto fail again. As always, your mileage may vary.</p>\n<hr>\n<p>I&rsquo;ve been working with <a href=\"https://github.com/facebookresearch/detectron2\">Detectron 2</a> a lot recently, building\nobject-detection models for <a href=\"https://elderresearch.com\">work</a> using <a href=\"https://pytorch.org\">PyTorch</a> and\n<a href=\"https://arxiv.org/abs/1506.01497\">Faster R-CNN</a>. Detectron 2 is a delightful and extensible\nframework for computer-vision tasks<sup id=\"fnref:27-1\"><a class=\"footnote-ref\" href=\"#fn:27-1\">1</a></sup> but it turns out not to\noffer a baked-in method for tracking evaluation losses during\ntraining&mdash;kind of a basic thing in machine-learning world. In\nML, evaluation losses and other tests against out-of-sample data\nare critical for estimating overfit and finding a suitable\nresting point on the <a href=\"https://en.wikipedia.org/wiki/Bias\u2013variance_tradeoff\">bias-variance tradeoff</a> curve, but I\nwonder if this isn&rsquo;t a big concern for most computer-vision\nresearchers, who are trying to learn from millions of images and\nbillions of pixels.</p>\n<p>I wasn&rsquo;t the first to realize that this was missing, of course.\nDetectron 2&rsquo;s GitHub repo contains a few issues like <a href=\"https://github.com/facebookresearch/detectron2/issues/569\">this\none</a> discussing how to implement evaluation loss\ntracking, and there&rsquo;s also a <a href=\"https://medium.com/@apofeniaco/training-on-detectron2-with-a-validation-set-and-plot-loss-on-it-to-avoid-overfitting-6449418fbf4e\">related Medium\npost</a> that uses Detectron 2&rsquo;s hook system to\nsolve the problem. In a nutshell, Detectron 2&rsquo;s hook system\n<a href=\"https://github.com/facebookresearch/detectron2/blob/3bdf3ab4a4626985b3581da0a5b9e8c534b56980/detectron2/engine/train_loop.py#L116\">works like so</a>:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"k\">with</span> <span class=\"n\">EventStorage</span><span class=\"p\">(</span><span class=\"n\">start_iter</span><span class=\"p\">)</span> <span class=\"k\">as</span> <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">storage</span><span class=\"p\">:</span>\n    <span class=\"k\">try</span><span class=\"p\">:</span>\n        <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">before_train</span><span class=\"p\">()</span>\n        <span class=\"k\">for</span> <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">iter</span> <span class=\"ow\">in</span> <span class=\"nb\">range</span><span class=\"p\">(</span><span class=\"n\">start_iter</span><span class=\"p\">,</span> <span class=\"n\">max_iter</span><span class=\"p\">):</span>\n            <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">before_step</span><span class=\"p\">()</span>\n            <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">run_step</span><span class=\"p\">()</span>\n            <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">after_step</span><span class=\"p\">()</span>\n    <span class=\"k\">except</span> <span class=\"ne\">Exception</span><span class=\"p\">:</span>\n        <span class=\"n\">logger</span><span class=\"o\">.</span><span class=\"n\">exception</span><span class=\"p\">(</span><span class=\"s2\">&quot;Exception during training:&quot;</span><span class=\"p\">)</span>\n        <span class=\"k\">raise</span>\n    <span class=\"k\">finally</span><span class=\"p\">:</span>\n        <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">after_train</span><span class=\"p\">()</span>\n</code></pre></div>\n\n<p><em>(Source code from <a href=\"https://github.com/facebookresearch/detectron2/blob/3bdf3ab4a4626985b3581da0a5b9e8c534b56980/detectron2/engine/train_loop.py#L116\">Detectron 2 on GitHub</a>.)</em></p>\n<p>Custom training code code can cleanly <a href=\"https://github.com/facebookresearch/detectron2/blob/3bdf3ab4a4626985b3581da0a5b9e8c534b56980/detectron2/engine/train_loop.py#L98\">register for\nthese</a> hook methods, and this approach works\nwell for single-GPU training. But I figured out pretty quickly\nthat the hook-based system falls over when training with multiple\nGPUs (I&rsquo;m often training this particular model with <a href=\"https://aws.amazon.com/ec2/instance-types/p3/\">4 V100s on\nAWS</a>), probably from communication errors among the GPUs.\nI saw a post suggesting that different GPUs might be getting\nstuck in different parts of the code, since the hook system is\nimplemented across multiple functions, and this tracks with my\nexperience.</p>\n<p>One way around this multi-GPU issue is to bypass the hook system\nentirely, directly subclassing <code>SimpleTrainer</code>&rsquo;s <a href=\"https://github.com/facebookresearch/detectron2/blob/3bdf3ab4a4626985b3581da0a5b9e8c534b56980/detectron2/engine/train_loop.py#L200\"><code>run_step()</code>\nmethod</a> since we use a custom trainer descended\nfrom <code>SimpleTrainer</code>:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"k\">class</span> <span class=\"nc\">OurFancyTrainer</span><span class=\"p\">(</span><span class=\"n\">DefaultTrainer</span><span class=\"p\">):</span>\n\n    <span class=\"k\">def</span> <span class=\"nf\">run_step</span><span class=\"p\">(</span><span class=\"bp\">self</span><span class=\"p\">)</span> <span class=\"o\">-&gt;</span> <span class=\"kc\">None</span><span class=\"p\">:</span>\n        <span class=\"nb\">super</span><span class=\"p\">()</span><span class=\"o\">.</span><span class=\"n\">run_step</span><span class=\"p\">()</span>\n\n        <span class=\"c1\"># At a given number of iterations...</span>\n        <span class=\"bp\">self</span><span class=\"o\">.</span><span class=\"n\">calculate_test_losses</span><span class=\"p\">()</span>\n</code></pre></div>\n\n<p>This approach is similar in spirit to the hook-based system\u2014we&rsquo;ve\nonly moved some code from, e.g., <code>after_step()</code> into the end of\n<code>run_step()</code>\u2014but by subclassing the trainer we&rsquo;re now able to\ndeliver code of similar complexity that works just fine for both\nsingle and multiple GPUs.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:27-1\">\n<p>Detectron 2 is a great Python code base. It&rsquo;s well\norganized, extensible, and uses type hints in many places. With\nonly a few thousand lines of code, I&rsquo;ve been able to write data\nloaders, evaluators, etc., without writing any models from\nscratch. Detectron 2 also uses the <a href=\"https://github.com/rbgirshick/yacs\">YACS config system</a> for\nspecifying and tracking experiments, which I <em>really</em> like.&#160;<a class=\"footnote-backref\" href=\"#fnref:27-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2020-06-24T11:16:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2020/06/a-wedding",
        "url":  "https://tshafer.com/blog/2020/06/a-wedding",
        "title": "COVID-19 and a Wedding",
        "content_html": "<p>Two good friends of ours were married this past weekend, tucked into the far corner of a backyard under a few trees, warm and breezy with light clouds and singing birds. It was lovely. They moved up their wedding day by several months because of the pandemic, hosting a small gathering for family instead of waiting for a larger celebration in the fall.<sup id=\"fnref:28-1\"><a class=\"footnote-ref\" href=\"#fn:28-1\">1</a></sup> That later gathering will hopefully happen, too, but it&rsquo;s still another &ldquo;normal&rdquo; thing hurriedly rearranged.</p>\n<p>It&rsquo;s natural to feel sad about all the big and small inconveniences that COVID-19 is causing. And to recognize that, for many people, &ldquo;inconvenient&rdquo; drastically undersells the situation: it&rsquo;s life or death, whether because of the disease itself or from the economic consequences. But in view of everything moved around or put on hold over the last few months, this wedding brought some particular kind of joy. For the Christian, a wedding isn&rsquo;t just a big party (though it is that), it&rsquo;s a group celebration of two people choosing to team up for the rest of their lives. And to <a href=\"https://genius.com/10572890\" title=\"Thrice, &quot;The Weight&quot;\">continue choosing, day after day</a>. In these times, &ldquo;choosing to choose&rdquo; <em>now</em> seems more tangible than ever.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:28-1\">\n<p>&lsquo;By extreme coincidence&rsquo; Martha and I happened to join the next-door neighbors for dinner around wedding time, too. After consulting with the bride and groom ahead of time, of course. \ud83d\ude01&#160;<a class=\"footnote-backref\" href=\"#fnref:28-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2020-06-03T07:16:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2020/04/best-books-of-2019",
        "url":  "https://tshafer.com/blog/2020/04/best-books-of-2019",
        "title": "My Favorite Books of 2019",
        "content_html": "<h2 id=\"science-fiction\">Science Fiction</h2>\n<ul>\n<li><a href=\"https://www.goodreads.com/book/show/41941223-recursion\"><em>Recursion</em></a></li>\n<li><a href=\"https://www.goodreads.com/book/show/40004664-dark-matter\"><em>Dark Matter</em></a></li>\n<li><em>Red Rising</em>, Books <a href=\"https://www.goodreads.com/book/show/18046624-red-rising\">One</a>, <a href=\"https://www.goodreads.com/book/show/21425079-golden-son\">Two</a>, and <a href=\"https://www.goodreads.com/book/show/24685115-morning-star\">Three</a></li>\n</ul>\n<h2 id=\"nonfiction\">Nonfiction</h2>\n<ul>\n<li><a href=\"https://www.goodreads.com/book/show/38212158-these-truths\"><em>These Truths: A History of the United States</em></a></li>\n<li><a href=\"https://www.goodreads.com/book/show/40106480-dreyer-s-english\"><em>Dreyer\u2019s English: An Utterly Correct Guide to Clarity and Style</em></a></li>\n<li><a href=\"https://www.goodreads.com/book/show/40672036-digital-minimalism\"><em>Digital Minimalism: Choosing a Focused Life in a Noisy World</em></a></li>\n<li><a href=\"https://www.goodreads.com/book/show/36247169-educated\"><em>Educated</em></a></li>\n<li><a href=\"https://www.goodreads.com/book/show/36710811-how-to-change-your-mind\"><em>How to Change Your Mind: What the New Science of Psychedelics Teaches Us about Consciousness, Dying, Addiction, Depression, and Transcendence</em></a></li>\n</ul>",
        "date_published": "2020-04-04T19:25:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2018/05/emoji-ssid-netgear-r6900",
        "url":  "https://tshafer.com/blog/2018/05/emoji-ssid-netgear-r6900",
        "title": "Emojify Your Netgear R6900",
        "content_html": "<p>During a recent visit to my in-laws, we hilariously realized that they were receiving 100 Mbps downlink internet service&hellip;but piping that through a super old Netgear 802.11b router I had installed a while back. Whoops.</p>\n<p>We remedied that situation via a Netgear R6900 (similar to the Wirecutter&rsquo;s recommended R7000), and I wanted to add en emoji to the new SSID. Because that&rsquo;s just indisputably better than not having an emoji. But, when I tried to enter a fun emoji-based SSID, I was greeted with a JavaScript alert dialog saying &lsquo;nope&rsquo;.</p>\n<p>A quick search yielded two super helpful solutions for this problem: <a href=\"https://medium.com/@bcjordan/emojify-your-wi-fi-c01f4ac0b0ab\">one for a TP-Link Archer C1900</a> and another for a <a href=\"https://alexplescan.com/posts/2016/08/16/emojify-your-wifi-netgear-r6300/\">Netgear R6300</a>, which I figured should be very similar to my situation. After some fiddling, I was successful in adapting their work to the R6900 with just a bit more effort. The <a href=\"https://alexplescan.com/posts/2016/08/16/emojify-your-wifi-netgear-r6300/\">Netgear R6300 solution</a> works by replacing the SSID validation JavaScript code with a function that always returns <code>true</code>:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nx\">checkData</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"kd\">function</span><span class=\"p\">()</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"k\">return</span><span class=\"w\"> </span><span class=\"kc\">true</span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"p\">}</span>\n</code></pre></div>\n\n<p>The R6900 is a little more complicated. The router supports both 2.4 GHz and 5 GHz broadcast, with two associated SSIDs. As a result, for the R6900 we have to update <em>two</em> <code>checkData</code> functions: one associated with the 2.4 GHz band and one with the 5 GHz band. It turns out that you can update these functions in the same way as described in the above posts, being sure to choose the appropriate page to update in the lower right-hand corner of the Safari Developer Console. For the R6900, we have to replace <code>checkData</code> in both the <strong>wl2gsetting</strong> and <strong>wl5gsetting</strong> pages.</p>\n<div style=\"text-align:center\"><img src=\"https://tshafer.com/blog/2018/05/emoji-ssid-netgear-r6900_files/pages.png\" width=\"364\" style=\"max-width:100%\"></div>\n\n<p>The R6900 checkData functions are also a bit complicated, so I opted to just copy the function definition, comment out the SSID regular expression check, and reassign the function:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nx\">checkData</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">save_only</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">    </span><span class=\"p\">...</span>\n<span class=\"w\">    </span><span class=\"c1\">//if (cf.ssid.value.match(/[^(\\x20-\\x7E\\xA0)]/)) {</span>\n<span class=\"w\">    </span><span class=\"c1\">//  return alertR(getErrorMsgByVar(&quot;gsm_msg_inv_ssid&quot;));</span>\n<span class=\"w\">    </span><span class=\"c1\">//}</span>\n<span class=\"w\">    </span><span class=\"p\">...</span>\n<span class=\"p\">}</span>\n</code></pre></div>\n\n<p>After updating <code>checkData</code> in both pages, setting the SSID, and hitting &ldquo;Apply&rdquo;, we&rsquo;re good to go. \ud83d\ude04\ud83d\udc4d\ud83c\udffb\ud83d\ude4c\ud83c\udffb</p>\n<div style=\"text-align:center\"><img src=\"https://tshafer.com/blog/2018/05/emoji-ssid-netgear-r6900_files/networks2.png\" width=\"287\" style=\"max-width:100%\"></div>",
        "date_published": "2018-05-26T12:57:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2018/04/fortran-again",
        "url":  "https://tshafer.com/blog/2018/04/fortran-again",
        "title": "Fortran, Again",
        "content_html": "<p>I&rsquo;ve recently had the opportunity to do a bit of remote advising in Physics\u2014debugging modern Fortran code, mostly\u2014and have been reminded:</p>\n<ol>\n<li>I do kind of miss working with fast, compiled languages.</li>\n<li>I don&rsquo;t miss debugging this stuff at all.</li>\n</ol>\n<p>I&rsquo;m still pretty proud of the software Mika and I wrote while I was in school, and I&rsquo;m happy the codes are still being used. And that they still work!</p>\n<div align=\"center\">\n  <img src=\"/blog/2018/04/fortran-again_files/pnfam_cloc.png\" style=\"max-width: 85%\">\n  <img src=\"/blog/2018/04/fortran-again_files/pnfam_mpi.png\" style=\"max-width: 85%\">\n</span>",
        "date_published": "2018-04-23T21:15:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2018/03/ncaa-empirical-bayes",
        "url":  "https://tshafer.com/blog/2018/03/ncaa-empirical-bayes",
        "title": "Empirical Bayes: UMBC Over Virginia",
        "content_html": "<p>This year&rsquo;s NCAA tournament finally brought the mother of all upsets: 16-seeded UMBC (the Retrievers!) <a href=\"http://www.espn.com/mens-college-basketball/game?gameId=401025813\">beat the Virginia Cavaliers by 21 points</a>!<sup id=\"fnref:39-1\"><a class=\"footnote-ref\" href=\"#fn:39-1\">1</a></sup> Just how unlikely was UMBC&rsquo;s upset?</p>\n<p>To get a more precise without delving deeply into basketball metrics we can use <a href=\"http://varianceexplained.org/r/empirical_bayes_baseball/\">Empirical Bayes</a> as a framework for estimating the likelihood of a 16-over-1 upset. Our NCAA problem turns out to be especially suited for this approach because we&rsquo;re asking a question about probabilities (this involves a Beta distribution) using win-loss data (&ldquo;Bernoulli trials&rdquo;). Under this setup, the probability of a 16-over-1 upset is described by a probability distribution <script type=\"math/tex\">\\mathrm{Beta}(\\alpha, \\beta)</script>. The function <script type=\"math/tex\">\\mathrm{Beta}</script> gives us us the &ldquo;probability of a probability&rdquo;: it tells us what the probability of a 16-over-1 upset <em>could</em> be. Maybe it&rsquo;s 1%, maybe 10%, etc.</p>\n<p>We need two pieces of information to apply Empirical Bayes: an initial guess at the probability of an upset (a &ldquo;prior&rdquo; belief) and some hard data. The data is easy to gather; in the 33 years since the NCAA tournament format expanded to include 64 teams, 16 seeds had won zero games in <script type=\"math/tex\">33 \\cdot 4 = 132</script> attempts.</p>\n<p>Our &ldquo;initial guess&rdquo; is subjective, but it won&rsquo;t turn out to matter much. For fun, let&rsquo;s pick two extreme guesses and see how they affect our estimates.</p>\n<ul>\n<li>Prior 1: <em>We have no idea what the probability of an upset is; it&rsquo;s just as likely for a 16 to beat a 1 as for the reverse outcome.</em> This is clearly not a realistic position, but it (in a sense) keeps us from imposing our point of view on the problem. This belief corresponds to a <script type=\"math/tex\">\\mathrm{Beta}(1,1)</script> distribution.</li>\n<li>Prior 2: <em>We&rsquo;re pretty sure 1&rsquo;s will beat 16&rsquo;s; maybe <a href=\"http://www.sportingnews.com/ncaa-basketball/news/march-madness-2018-umbc-money-line-odds-point-spread-virginia-las-vegas/135p0r8tniv461eoxnox3q928e\">call it 20-to-1</a> odds against an upset.</em> This is a much stronger initial position and corresponds to an initial <script type=\"math/tex\">\\mathrm{Beta}(1, 20)</script> distribution.<sup id=\"fnref:39-2\"><a class=\"footnote-ref\" href=\"#fn:39-2\">2</a></sup></li>\n</ul>\n<p>Our two guesses look like this:</p>\n<p><img src=\"/blog/2018/03/ncaa-empirical-bayes_files/prior_plot-1.png\" /width=\"672\" style=\"display: block; margin: 0 auto;\" /></p>\n<p>The first guess is completely flat; before seeing hard data, we do not claim to know if 16 seeds win 0% of the time, 55%, or 99%, etc. The second guess, though, is much more opinionated: we&rsquo;re pretty sure that the 16 seed is going to lose.</p>\n<table>\n<thead>\n<tr>\n<th>Guess</th>\n<th>99% CI Lower</th>\n<th>99% CI Upper</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Flat Prior</td>\n<td>0</td>\n<td>1.000</td>\n</tr>\n<tr>\n<td>Opinionated Prior</td>\n<td>0</td>\n<td>0.197</td>\n</tr>\n</tbody>\n</table>\n<p>With the &ldquo;flat&rdquo; prior, we don&rsquo;t claim to know the probability of an upset: The <a href=\"https://stats.stackexchange.com/q/148439/12066\">99% credible interval</a> for the upset probability ranges from 0% to 100%. On the other hand, our 20-to-1 guess really <em>does</em> make an impact: The 99% CI ranges between 0% and 20%. That seems more reasonable.</p>\n<p>Now, let&rsquo;s add some data and answer the question: Going into this year, how unlikely was it that any individual 16 seed would beat a 1 seed?<sup id=\"fnref:39-3\"><a class=\"footnote-ref\" href=\"#fn:39-3\">3</a></sup> Since 1985 we&rsquo;d observed 132 failures and zero successes, so we can update our initial guesses very easily:  <script type=\"math/tex\">\\alpha_{2018} = \\alpha_\\mathrm{prior} + 0</script> and <script type=\"math/tex\">\\beta_{2018}=\\beta_\\mathrm{prior} + 132</script>. This is possible because of the Beta distribution/Bernoulli trial problem type&mdash;they are <a href=\"https://en.wikipedia.org/wiki/Conjugate_prior#Table_of_conjugate_distributions\">conjugate priors</a>:</p>\n<ul>\n<li>\n<script type=\"math/tex\">\\mathrm{Beta}(1, 1)  \\to \\mathrm{Beta}(1, 133)</script>\n</li>\n<li>\n<script type=\"math/tex\">\\mathrm{Beta}(1, 20) \\to \\mathrm{Beta}(1, 152)</script>\n</li>\n</ul>\n<p>The resulting probability distributions are below&mdash;we have to really zoom in to even see where the distribution is nonzero.</p>\n<p><img src=\"/blog/2018/03/ncaa-empirical-bayes_files/posterior_plot2-1.png\" width=\"432\" style=\"display: block; margin: 0 auto;\" /></p>\n<p>Two takeaways:</p>\n<ul>\n<li>These are very similar distributions.</li>\n<li>The probability of a 16-over-1 upset is very small\n    (we are zoomed in 20 times on the <script type=\"math/tex\">x</script> axis).</li>\n</ul>\n<p>Just how small is that probability?</p>\n<table>\n<thead>\n<tr>\n<th>Guess</th>\n<th>Most Likely</th>\n<th>99% CI Lower</th>\n<th>99% CI Upper</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Flat Prior</td>\n<td>0</td>\n<td>0</td>\n<td>0.0340</td>\n</tr>\n<tr>\n<td>Opinionated Prior</td>\n<td>0</td>\n<td>0</td>\n<td>0.0298</td>\n</tr>\n</tbody>\n</table>\n<p>Even for the flat prior, our 99% CI only ranges up to a <strong>3.4% chance of upset</strong>. But, as promised, the opinionated prior hasn&rsquo;t made much of a difference: incorporating our 20-to-1 guess only shrinks the probability to a 3% chance. In both cases, the <strong>most likely probability of an upset (the mode) is 0%</strong>!</p>\n<p>So&mdash;we could not have ruled out an upset being impossible, given these results. That&rsquo;s good, given what happened next! But, also happily, the estimated probability of an upset is very small and matches our intuition. Pretty cool!</p>\n<h2 id=\"coda\">Coda</h2>\n<p>So&hellip;what about <em>after</em> this year? To find out, we just follow the same procedure as before, adding three non-upsets and our one shiny new upset. The posterior distributions are now:</p>\n<ul>\n<li>Flat Prior <script type=\"math/tex\">\\to \\mathrm{Beta}(2, 136)</script>\n</li>\n<li>Opinionated Prior <script type=\"math/tex\">\\to \\mathrm{Beta}(2, 155)</script>\n</li>\n</ul>\n<p>And as a result, our updated probabilities are:</p>\n<table>\n<thead>\n<tr>\n<th>Guess</th>\n<th>Most Likely</th>\n<th>99% CI Lower</th>\n<th>99% CI Upper</th>\n</tr>\n</thead>\n<tbody>\n<tr>\n<td>Flat Prior</td>\n<td>0.0074</td>\n<td>1e-04</td>\n<td>0.0474</td>\n</tr>\n<tr>\n<td>Opinionated Prior</td>\n<td>0.0065</td>\n<td>1e-04</td>\n<td>0.0417</td>\n</tr>\n</tbody>\n</table>\n<p>Still small, with the 99% CI ranging from about <strong>0.01% to 4.7%</strong> and the <strong>most likely probability somewhere around 0.7%</strong>. But notice&mdash;now that we have finally observed an upset&mdash;that our posterior distribution no longer allows for a 0% chance of upset. As it shouldn&rsquo;t!</p>\n<p><img src=\"/blog/2018/03/ncaa-empirical-bayes_files/update_plot-1.png\" width=\"432\" style=\"display: block; margin: 0 auto;\" /></p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:39-1\">\n<p>Even worse: Virginia was the <em>overall</em> number one seed, the highest-ranked team in the nation.&#160;<a class=\"footnote-backref\" href=\"#fnref:39-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n<li id=\"fn:39-2\">\n<p>This is the only constraint given for a two-parameter problem; it&rsquo;s only enough to guarantee <script type=\"math/tex\">\\alpha</script> and <script type=\"math/tex\">\\beta</script> are related: <script type=\"math/tex\">20\\alpha = \\beta</script>. In keeping with the flat prior, we set <script type=\"math/tex\">\\alpha=1</script> so that <script type=\"math/tex\">\\beta=20</script>.&#160;<a class=\"footnote-backref\" href=\"#fnref:39-2\" title=\"Jump back to footnote 2 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n<li id=\"fn:39-3\">\n<p>The probability that <em>any</em> of the four 16 seeds would win is slightly different: <script type=\"math/tex\">p_\\mathrm{any} = 1 - (1 - p_\\mathrm{one})^4</script>.&#160;<a class=\"footnote-backref\" href=\"#fnref:39-3\" title=\"Jump back to footnote 3 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2018-03-02T21:04:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2017/11/bcf",
        "url":  "https://tshafer.com/blog/2017/11/bcf",
        "title": "Bayesian Coin Flips—The bcf Package",
        "content_html": "<p>In an experiment with <a href=\"https://en.wikipedia.org/wiki/Approximate_Bayesian_computation\">Approximate Bayesian Computation</a> and <a href=\"http://r-pkgs.had.co.nz\">R packages</a>, I <a href=\"https://github.com/tomshafer/bcf\">uploaded a new R package of my own</a> to GitHub a few days ago named <strong>bcf</strong> for Bayesian Coin Flip. It simulates <script type=\"math/tex\">N</script>-person games of skill, approximating these games as multiple players flipping coins with different &ldquo;fairness parameters&rdquo; <script type=\"math/tex\">\\theta_i \\sim \\mathrm{Beta}(\\alpha_i, \\beta_i)</script>. The first player to obtain a &ldquo;Heads&rdquo; result first wins, dealing with ties in a sensible way.<sup id=\"fnref:32-1\"><a class=\"footnote-ref\" href=\"#fn:32-1\">1</a></sup></p>\n<p>The ABC concept is well explained in a pair of articles. First, <a href=\"http://www.sumsar.net/blog/2014/10/tiny-data-and-the-socks-of-karl-broman/\">Rasmus B\u00e5\u00e5th</a> introduces ABC through an exercise involving mismatched socks in the laundry (thanks for pointing me to this, <a href=\"http://darrkj.github.io\">Kenny</a>). And <a href=\"https://darrenjw.wordpress.com/2013/03/31/introduction-to-approximate-bayesian-computation-abc/\">Darren Wilkinson</a> also does a nice job explaining how ABC works.</p>\n<p>As far as <strong>bcf</strong> is concerned, the probability of a coin coming up Heads is picked from a distribution assigned to each player, <script type=\"math/tex\">\\theta_i^* \\sim \\mathrm{Beta}(\\alpha_i, \\beta_i)</script>. The package then simulates a set of games <script type=\"math/tex\">d^*</script> using these parameters and provides samples from the joint distribution <script type=\"math/tex\">(d^*, \\theta^*)</script>. Finally, by keeping only the data that match an observed result, we end up with a distribution proportional to the posterior <script type=\"math/tex\">p(\\theta|d)</script>.<sup id=\"fnref:32-2\"><a class=\"footnote-ref\" href=\"#fn:32-2\">2</a></sup></p>\n<h2 id=\"example-usage\">Example Usage</h2>\n<p>I built the package to better understand ABC and to humorously model our office&rsquo;s dart-playing abilities. To keep things simple, the <strong>bcf</strong> package only provides a few functions: We can initialize a player, run a game, and use the results of the game to update a player&rsquo;s statistics.</p>\n<p>A basic game might go like so: We assume a pretty weak prior (<script type=\"math/tex\">\\theta \\sim \\mathrm{Beta}(1.2, 1.2)</script>) for each player before running a few games. After each game, we update the involved players.</p>\n<p>In practice, I think it&rsquo;s pretty easy to use. First, instantiate a few players:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">library</span><span class=\"p\">(</span><span class=\"n\">bcf</span><span class=\"p\">)</span>\n\n<span class=\"n\">tom</span><span class=\"w\">   </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">new_player</span><span class=\"p\">(</span><span class=\"s\">&quot;Tom&quot;</span><span class=\"p\">,</span><span class=\"w\">   </span><span class=\"n\">alpha</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1.2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">beta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1.2</span><span class=\"p\">)</span>\n<span class=\"n\">david</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">new_player</span><span class=\"p\">(</span><span class=\"s\">&quot;David&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">alpha</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1.2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">beta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1.2</span><span class=\"p\">)</span>\n<span class=\"n\">kevin</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">new_player</span><span class=\"p\">(</span><span class=\"s\">&quot;Kevin&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">alpha</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1.2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">beta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1.2</span><span class=\"p\">)</span>\n\n<span class=\"nf\">print</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code>## UUID:    d8cfe17e-c81d-11e7-9f88-f45c899c4b7b\n## Name:    Tom\n##\n## Games:   0\n## Wins:    0\n## Losses:  0\n##\n## Est. Distribution:   Beta(1.200, 1.200)\n## MAP Win Percentage:  50.000\n</code></pre></div>\n\n<p>Then simulate three games for which we already have results:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"c1\"># Tom wins, David places second, Kevin finishes third</span>\n<span class=\"n\">game_1</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">abc_coin_flip_game</span><span class=\"p\">(</span>\n<span class=\"w\">  </span><span class=\"n\">players</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">list</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">david</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">kevin</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">result</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">3</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">iters</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">5000</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">cores</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">5L</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code>## No. players:   3\n## Assign result: 1, 2, 3\n## Iters:         5000\n## CPU cores:     5\n## Workloads:     1000, 1000, 1000, 1000, 1000\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">tom</span><span class=\"w\">   </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">,</span><span class=\"w\">   </span><span class=\"n\">game_1</span><span class=\"p\">)</span>\n<span class=\"n\">david</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">david</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">game_1</span><span class=\"p\">)</span>\n<span class=\"n\">kevin</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">kevin</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">game_1</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># Tom wins, Kevin places second, David finishes third</span>\n<span class=\"n\">game_2</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">abc_coin_flip_game</span><span class=\"p\">(</span>\n<span class=\"w\">  </span><span class=\"n\">players</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">list</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">david</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">kevin</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">result</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">3</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">2</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">iters</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">5000</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">cores</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">5L</span><span class=\"p\">)</span>\n\n\n<span class=\"n\">tom</span><span class=\"w\">   </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">,</span><span class=\"w\">   </span><span class=\"n\">game_2</span><span class=\"p\">)</span>\n<span class=\"n\">david</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">david</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">game_2</span><span class=\"p\">)</span>\n<span class=\"n\">kevin</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">kevin</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">game_2</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># Tom finishes second, David wins, Kevin finishes third</span>\n<span class=\"n\">game_3</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">abc_coin_flip_game</span><span class=\"p\">(</span>\n<span class=\"w\">  </span><span class=\"n\">players</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">list</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">david</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">kevin</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">result</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"m\">2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">3</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">iters</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">5000</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">cores</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">5L</span><span class=\"p\">)</span>\n\n\n<span class=\"n\">tom</span><span class=\"w\">   </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">,</span><span class=\"w\">   </span><span class=\"n\">game_3</span><span class=\"p\">)</span>\n<span class=\"n\">david</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">david</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">game_3</span><span class=\"p\">)</span>\n<span class=\"n\">kevin</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">update_player</span><span class=\"p\">(</span><span class=\"n\">kevin</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">game_3</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<p><strong>bcf</strong> then provides methods for examining both players and games:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">print</span><span class=\"p\">(</span><span class=\"n\">game_3</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code>## # A tibble: 6 x 6\n##     Tom David Kevin outcome     n   pct\n##   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;chr&gt; &lt;int&gt; &lt;dbl&gt;\n## 1     1     2     3          1627  32.5\n## 2     1     3     2          1917  38.3\n## 3     2     1     3     ***   439   8.8\n## 4     2     3     1           590  11.8\n## 5     3     1     2           222   4.4\n## 6     3     2     1           205   4.1\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">plot</span><span class=\"p\">(</span><span class=\"n\">game_3</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<p><img src=\"/blog/2017/11/bcf_files/bcf-game.png\" width=\"672\" style=\"display: block; margin: auto;\" /></p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"nf\">plot</span><span class=\"p\">(</span><span class=\"n\">tom</span><span class=\"p\">)</span>\n</code></pre></div>\n\n<p><img src=\"/blog/2017/11/bcf_files/bcf-player.png\" width=\"384\" style=\"display: block; margin: auto;\" /></p>\n<p>This has been a pretty fun experiment in ABC and in R packaging. I&rsquo;ll update this post if I ever return to the project.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:32-1\">\n<p>If mote than one player obtains the same result on a given flip, these players play one or more sub-games to break the tie.&#160;<a class=\"footnote-backref\" href=\"#fnref:32-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n<li id=\"fn:32-2\">\n<p>One gotcha\u2014for now\u2014is that <strong>bcf</strong> imposes a beta distribution for each player&rsquo;s win probability. After working out the likelihood on paper, I don&rsquo;t think the posterior is actually a beta&hellip;just <em>almost</em> a beta. The possibility of ties and additional rounds adds complication.&#160;<a class=\"footnote-backref\" href=\"#fnref:32-2\" title=\"Jump back to footnote 2 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2017-11-12T22:05:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2017/10/simmons-nfl-records",
        "url":  "https://tshafer.com/blog/2017/10/simmons-nfl-records",
        "title": "Are NFL Teams Exceedingly Mediocre This Year?",
        "content_html": "<p>On a <a href=\"https://www.theringer.com/the-bill-simmons-podcast/2017/10/16/16484840/guess-the-lines-week-7-cousin-sal\">recent podcast</a>, Bill Simmons wondered aloud if the NFL as a whole is especially mediocre this year. I haven&rsquo;t been watching all that much NFL football this season, but from what I have seen this observation rings true&mdash;the teams do seem pretty bad.</p>\n<p>Fortunately, the question <em>are NFL teams exceptionally mediocre this year?</em> is pretty easy to answer precisely thanks to <a href=\"https://www.pro-football-reference.com/years/2016/week_1.htm\">football-reference.com</a>. First, I pulled week-by-week results for nine recent seasons,<sup id=\"fnref:31-1\"><a class=\"footnote-ref\" href=\"#fn:31-1\">1</a></sup> being mindful to be a good citizen and download the data very slowly:</p>\n<div class=\"codehilite\"><pre><span></span><code>wget<span class=\"w\"> </span>-w<span class=\"w\"> </span><span class=\"m\">3</span><span class=\"w\"> </span>https://www.pro-football-reference.com/years/2016/week_<span class=\"o\">{</span><span class=\"m\">1</span>,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17<span class=\"o\">}</span>.htm\n</code></pre></div>\n\n<p>Then, I parsed out the weekly winners and losers using <a href=\"https://github.com/hadley/rvest\">rvest</a> and <a href=\"http://purrr.tidyverse.org\">purrr</a>:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"c1\"># Team name is the last word</span>\n<span class=\"n\">.</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span><span class=\"w\"> </span><span class=\"nf\">str_split</span><span class=\"p\">(</span><span class=\"s\">&quot; &quot;</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span><span class=\"w\"> </span><span class=\"nf\">`[[`</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span><span class=\"w\"> </span><span class=\"nf\">tail</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">-&gt;</span><span class=\"w\"> </span><span class=\"n\">get_team_name</span>\n\n<span class=\"c1\"># Extract teams</span>\n<span class=\"n\">.</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">  </span><span class=\"nf\">html_text</span><span class=\"p\">()</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">  </span><span class=\"nf\">Filter</span><span class=\"p\">(</span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">.x</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"n\">.x</span><span class=\"w\"> </span><span class=\"o\">!=</span><span class=\"w\"> </span><span class=\"s\">&quot;Final&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">.</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">  </span><span class=\"nf\">map_df</span><span class=\"p\">(</span><span class=\"o\">~</span><span class=\"w\"> </span><span class=\"nf\">tibble</span><span class=\"p\">(</span><span class=\"n\">team</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">get_team_name</span><span class=\"p\">(</span><span class=\"n\">.x</span><span class=\"p\">)))</span><span class=\"w\"> </span><span class=\"o\">-&gt;</span><span class=\"w\"> </span><span class=\"n\">extract_teams</span>\n\n<span class=\"n\">get_winners_losers</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">html_document</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">week</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"n\">doc</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">read_html</span><span class=\"p\">(</span><span class=\"n\">html_document</span><span class=\"p\">)</span>\n\n<span class=\"w\">  </span><span class=\"nf\">bind_rows</span><span class=\"p\">(</span>\n<span class=\"w\">    </span><span class=\"nf\">html_nodes</span><span class=\"p\">(</span><span class=\"n\">doc</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;tr.winner td a&quot;</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">      </span><span class=\"nf\">extract_teams</span><span class=\"p\">()</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">      </span><span class=\"nf\">mutate</span><span class=\"p\">(</span><span class=\"n\">week</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">week</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">win</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">loss</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">0</span><span class=\"p\">),</span>\n<span class=\"w\">    </span><span class=\"nf\">html_nodes</span><span class=\"p\">(</span><span class=\"n\">doc</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;tr.loser td a&quot;</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">      </span><span class=\"nf\">extract_teams</span><span class=\"p\">()</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">      </span><span class=\"nf\">mutate</span><span class=\"p\">(</span><span class=\"n\">week</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">week</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">win</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">0</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">loss</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"m\">1</span><span class=\"p\">))</span>\n<span class=\"p\">}</span>\n\n<span class=\"c1\"># 2008 wk 9 is bad</span>\n<span class=\"n\">years</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">setdiff</span><span class=\"p\">(</span><span class=\"m\">2007</span><span class=\"o\">:</span><span class=\"m\">2017</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">2008</span><span class=\"p\">)</span>\n\n<span class=\"n\">won_loss_df</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">map_df</span><span class=\"p\">(</span><span class=\"n\">years</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"o\">~</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">  </span><span class=\"n\">year</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"n\">.x</span>\n<span class=\"w\">  </span><span class=\"nf\">message</span><span class=\"p\">(</span><span class=\"nf\">paste</span><span class=\"p\">(</span><span class=\"s\">&quot;Year&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">year</span><span class=\"p\">))</span>\n\n<span class=\"w\">  </span><span class=\"nf\">map_df</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"o\">:</span><span class=\"m\">17</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"o\">~</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">    </span><span class=\"n\">week</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"n\">.x</span>\n<span class=\"w\">    </span><span class=\"nf\">message</span><span class=\"p\">(</span><span class=\"nf\">paste</span><span class=\"p\">(</span><span class=\"s\">&quot;Week&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">week</span><span class=\"p\">))</span>\n\n<span class=\"w\">    </span><span class=\"n\">path</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">paste0</span><span class=\"p\">(</span><span class=\"s\">&quot;./pages/&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">year</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;/week_&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">week</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;.htm.xz&quot;</span><span class=\"p\">)</span>\n<span class=\"w\">    </span><span class=\"nf\">if </span><span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nf\">file.exists</span><span class=\"p\">(</span><span class=\"n\">path</span><span class=\"p\">))</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">      </span><span class=\"nf\">message</span><span class=\"p\">(</span><span class=\"nf\">paste0</span><span class=\"p\">(</span><span class=\"s\">&quot;File does not exist: &quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">path</span><span class=\"p\">))</span>\n<span class=\"w\">    </span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"n\">else</span><span class=\"w\"> </span><span class=\"p\">{</span>\n<span class=\"w\">      </span><span class=\"nf\">paste0</span><span class=\"p\">(</span><span class=\"s\">&quot;./pages/&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">year</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;/week_&quot;</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">week</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s\">&quot;.htm.xz&quot;</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">        </span><span class=\"nf\">get_winners_losers</span><span class=\"p\">(</span><span class=\"n\">week</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">%&gt;%</span>\n<span class=\"w\">        </span><span class=\"nf\">mutate</span><span class=\"p\">(</span><span class=\"n\">year</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">year</span><span class=\"p\">)</span>\n<span class=\"w\">    </span><span class=\"p\">}</span>\n<span class=\"w\">  </span><span class=\"p\">})</span>\n<span class=\"p\">})</span>\n</code></pre></div>\n\n<p>The resulting data frame, <code>won_loss_df</code>, marks whether teams won, lost, or had a bye. From this point it isn&rsquo;t hard to aggregate the results and take a look at historical results through Week Six, and I was a bit surprised by the following figure:</p>\n<div style=\"text-align: center;\">\n<img src=\"/blog/2017/10/simmons-nfl-records_files/wins-hist.png\" width=\"600\" style=\"display: inline; max-width: 100%;\" />\n</div>\n\n<p>2017 <em>is</em> the most mediocre season in the sample! (Even though it seemed possible, I wasn&rsquo;t really expecting the data to back me up.) But&hellip;2017 doesn&rsquo;t take the crown by all that much. The 2017 season brings 12 teams with three wins after Week Six, but the 2012 season had 11 teams and 2013 had 10. It just <em>feels</em> anomalous&mdash;maybe because so many other teams have 2 or 4 wins.<sup id=\"fnref:31-2\"><a class=\"footnote-ref\" href=\"#fn:31-2\">2</a></sup></p>\n<p>So, by this simple metric the NFL isn&rsquo;t <em>extraordinarily</em> mediocre this season&mdash;just <em>ho-hum</em> mediocre.</p>\n<p>Congratulations.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:31-1\">\n<p>One week&rsquo;s data was missing for the 2008 season, so that year was omitted.&#160;<a class=\"footnote-backref\" href=\"#fnref:31-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n<li id=\"fn:31-2\">\n<p>The 2017 season has the smallest standard deviation around the mean number of wins (<em>sd</em> = 1.17 wins), with the 2012 season the runner up (1.30).&#160;<a class=\"footnote-backref\" href=\"#fnref:31-2\" title=\"Jump back to footnote 2 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2017-10-20T23:48:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2017/09/christmas-dpac-2016",
        "url":  "https://tshafer.com/blog/2017/09/christmas-dpac-2016",
        "title": "Christmas at DPAC 2016",
        "content_html": "<p>It&rsquo;s a little early for Christmas (okay, it&rsquo;s absurdly early for Christmas), but I never linked the 2016 <em>Christmas at DPAC</em> show here, and Summit recently re-posted it with a nicely remixed audio track.</p>\n<p>This is one of the best things I&rsquo;ve been able to do for the last four years, and I&rsquo;m super grateful for the opportunity to play again this year. If you&rsquo;ll be around the Triangle near Christmas Eve, pick up some tickets in December and <a href=\"http://christmasatdpac.com\">come join us</a>!</p>\n<div style=\"text-align: center;\">\n<iframe style=\"max-width: 100%;\" src=\"https://player.vimeo.com/video/196971383?title=0&byline=0&portrait=0\" width=\"640\" height=\"360\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>\n</div>",
        "date_published": "2017-09-26T22:15:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2017/08/eclipse-2017",
        "url":  "https://tshafer.com/blog/2017/08/eclipse-2017",
        "title": "Eclipse 2017",
        "content_html": "<!-- https://stackoverflow.com/a/3204797/656740 -->\n<style type=\"text/css\">\nul.checkmark li:before {\n  content: \"\u2714\ufe0e\";\n  margin-right: 0.5em;\n}\nul.checkmark li {\n  list-style-type: none;\n}\n</style>\n\n<ul class=\"checkmark\">\n  <li>Pinhole projector</li>\n  <li>Family</li>\n  <li>Missing chunk of the sun</li>\n</ul>\n\n<p>Already putting <a href=\"https://eclipse.gsfc.nasa.gov/SEgoogle/SEgoogle2001/SE2024Apr08Tgoogle.html\">2024</a> on our calendar.</p>\n<div style=\"text-align: center\">\n<img src=\"/blog/2017/08/eclipse_files/family.jpg\" style=\"max-width: 45%\">\n<img src=\"/blog/2017/08/eclipse_files/eclipse-smiley.jpg\" style=\"max-width: 45%\">\n</div>",
        "date_published": "2017-08-23T20:23:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2017/07/dart-physics",
        "url":  "https://tshafer.com/blog/2017/07/dart-physics",
        "title": "Dart Physics",
        "content_html": "<p>We have a dart board at the office and have a good time lofting darts in nice, looping arcs. A recent project pushed me back into physics and led me to consider just how sensitive the dart-throwing motion is to small imperfections in angle; how precise do we need to be? Calculating those angular perturbations (<script type=\"math/tex\">dy/d\\theta</script> in the coordinates I&rsquo;ll set up next) requires the kinematics of the problem and provides an opportunity to solve the equations numerically with <a href=\"https://www.r-project.org\">R</a>.</p>\n<h2 id=\"setup\">Setup</h2>\n<p>Our office dart board is fixed at a location <script type=\"math/tex\">\\mathbf{r}_d = \\langle x_d, y_d \\rangle = \\langle L, 0\\rangle</script>, and a thrower (darter? player?) stands with their throwing elbow at a location <script type=\"math/tex\">\\mathbf{r}_t = \\langle 0, -r \\rangle</script>. Here <script type=\"math/tex\">r</script> is the player&rsquo;s forearm length and things are arranged so a &ldquo;perfect&rdquo; 90<script type=\"math/tex\">^\\circ</script> release starts from a height <script type=\"math/tex\">y=0</script>. Recall that the kinematics derive from Newton&rsquo;s second law. Assuming a constant gravitational acceleration <script type=\"math/tex\">g</script> near the earth&rsquo;s surface, we want to solve the equations\n<script type=\"math/tex; mode=display\">\nx(t) = x_0 + v_0^{(x)}t, \\\\\ny(t) = y_0 + v_0^{(y)}t - \\frac{1}{2}gt^2 .\n</script>\nBecause there are no forces acting left to right (except for the neglected air drag), the <script type=\"math/tex\">x</script> equation has no acceleration term.</p>\n<p>With the above equations come initial conditions: <script type=\"math/tex\">x_0 = r\\cos\\theta</script> and <script type=\"math/tex\">y_0 = r(\\sin\\theta - 1)</script> with <script type=\"math/tex\">\\theta</script> the <strong>release angle</strong> &ndash; the angle of the player&rsquo;s forearm at the moment of the throw. The angle <script type=\"math/tex\">\\theta</script> is measured conventionally (and unintuitively here) from the forward direction, so a throw vertically upwards would have <script type=\"math/tex\">\\theta = 180^\\circ</script>.  Computing <script type=\"math/tex\">\\mathbf{v}_0</script> is a little trickier but possible with a bit of trigonometry and calculus; the magnitude is just <script type=\"math/tex\">v = r d\\theta/dt \\equiv r\\omega</script>, and one can show that\n<script type=\"math/tex; mode=display\">\nv_0^{(x)} = \\phantom{-}r\\omega\\sin\\theta,\\\\\nv_0^{(y)} = -r\\omega\\cos\\theta .\n</script>\n</p>\n<h2 id=\"solution\">Solution</h2>\n<p>To find a solution wherein we hit the bullseye, we fix <script type=\"math/tex\">x(t_f) = L</script> and <script type=\"math/tex\">y(t_f) = 0</script>, with <script type=\"math/tex\">t_f</script> the time to target. These definitions specify the solution after some tedious algebra. First, the <script type=\"math/tex\">x</script> equation yields time as a simple function of release angle <script type=\"math/tex\">\\theta</script> and angular velocity <script type=\"math/tex\">\\omega</script> (or linear velocity <script type=\"math/tex\">\\nu = r\\omega</script>):\n<script type=\"math/tex; mode=display\">\nt_f = \\frac{L - r\\cos\\theta}{r\\omega\\sin\\theta} .\n</script>\nIntuitively, the time of flight is the ratio of the distance traveled in the <script type=\"math/tex\">x</script> direction to <script type=\"math/tex\">v_0^{(x)}</script>. The <script type=\"math/tex\">t_f</script> equation generates a more complicated expression for the angular velocity:\n<script type=\"math/tex; mode=display\">\n\\omega = \\left\\{\n\\frac{g (L - r\\cos\\theta)^2}\n{2 r^2\\sin\\theta \\left[ r(1-\\sin\\theta) - L\\cos\\theta \\right]}\n\\right\\}^{1/2} .\n</script>\nThis one&rsquo;s harder to interpret, but it has the correct units (s<script type=\"math/tex\">^{-1}</script>) and exhibits interesting divergences: <script type=\"math/tex\">\\omega \\to \\pm\\infty</script> as <script type=\"math/tex\">\\theta \\to 0</script>, <script type=\"math/tex\">\\pi/2</script>, <script type=\"math/tex\">\\pi</script>, etc. Note, too, that the <script type=\"math/tex\">\\pi/2</script> divergence differs from its <script type=\"math/tex\">\\pi</script> counterpart in that different parts of the denominator go to zero (<script type=\"math/tex\">\\sin\\theta</script> vs. the term in square brackets).</p>\n<p>With the solution in hand we can also find the maximum height of the dart during its flight. The path is a function of <script type=\"math/tex\">\\theta</script> and <script type=\"math/tex\">\\omega</script>, and the maximum height is an extremum of <script type=\"math/tex\">y(t)</script>: <script type=\"math/tex\">0 = \\frac{dy}{dt} = v_0^{(y)}-gt</script>. A check of the second derivative confirms this is, in fact, a maximum, and\n<script type=\"math/tex; mode=display\">\ny_\\mathrm{max} = r(\\sin\\theta - 1)\n  + \\frac{(L - r\\cos\\theta)^2}{4\\tan^2\\theta\\left[ r(1-\\sin\\theta) - L\\cos\\theta \\right]}.\n</script>\nThe maximum depends on the starting height and also varies with the <script type=\"math/tex\">x</script> distance to be crossed by the dart.</p>\n<p>Finally, we can work out an answer to my original question: how sensitive is the solution <script type=\"math/tex\">y(t)</script> to perturbations in angle, <script type=\"math/tex\">\\delta\\theta</script>? The answer comes from the accurate solution <script type=\"math/tex\">\\langle x(t_f) = L, y(t_f) = 0 \\rangle</script> by computing the derivative <script type=\"math/tex\">dy/d\\theta</script> at fixed <script type=\"math/tex\">\\omega</script> &ndash; the angular frequency of the accurate solution. The derivative is\n<script type=\"math/tex; mode=display\">\n\\frac{dy}{d\\theta}\\Bigg\\rvert_\\omega = (r\\cos\\theta - L) \\left\\{\n  - \\frac{1}{\\sin^2\\theta}\n  + \\frac{g}{r^2 \\omega^2 \\sin\\theta}\n     \\left[\n      r + \\frac{(r\\cos\\theta - L)}{\\sin\\theta \\tan\\theta}\n    \\right]\n\\right\\},\n</script>\nand this quantity can be interpreted as the <em>vertical distance</em> by which the dart would miss for a small (e.g., <script type=\"math/tex\">\\lesssim 1</script> degree) imperfection in the release angle. We could go on to ask similar questions about imperfections in velocity.</p>\n<h2 id=\"calculation\">Calculation</h2>\n<p>Now that we have the general solutions, let&rsquo;s take a look at the results numerically. Start by making a few assignments:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">G_EARTH</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"m\">9.8</span>\n\n<span class=\"n\">L</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"m\">10</span><span class=\"o\">*</span><span class=\"m\">12</span><span class=\"w\"> </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">2.54</span><span class=\"o\">/</span><span class=\"m\">100</span><span class=\"w\">  </span><span class=\"c1\"># 10 feet</span>\n<span class=\"n\">r</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"m\">14</span><span class=\"w\">    </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">2.54</span><span class=\"o\">/</span><span class=\"m\">100</span><span class=\"w\">  </span><span class=\"c1\"># 14 inch forearm</span>\n<span class=\"n\">C</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"m\">6</span><span class=\"o\">*</span><span class=\"m\">12</span><span class=\"w\">  </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">2.54</span><span class=\"o\">/</span><span class=\"m\">100</span><span class=\"w\">  </span><span class=\"c1\"># 6 more feet</span>\n</code></pre></div>\n\n<p>The constant <script type=\"math/tex\">C</script> measures the ceiling height relative to the bullseye, <script type=\"math/tex\">r</script> is the forearm length, and <script type=\"math/tex\">L</script> is the horizontal distance between the player and the board. Here I&rsquo;ve estimated <script type=\"math/tex\">L = 10</script> feet, <script type=\"math/tex\">r = 14</script> inches, and <script type=\"math/tex\">C = 6</script> feet of clearance. The gravitational constant is, as always, 9.8 meters per squared second. With the constants fixed, I&rsquo;ve implemented the kinematic equations as functions in R and computed them on a <script type=\"math/tex\">10^{-4}</script> radian grid spanning <script type=\"math/tex\">\\theta \\in [0, \\pi]</script>:<sup id=\"fnref:37-2\"><a class=\"footnote-ref\" href=\"#fn:37-2\">2</a></sup></p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">calc</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">tibble</span><span class=\"p\">(</span>\n<span class=\"w\">  </span><span class=\"n\">theta</span><span class=\"w\">     </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">seq</span><span class=\"p\">(</span><span class=\"kc\">pi</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">0</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"o\">-</span><span class=\"kc\">pi</span><span class=\"o\">*</span><span class=\"m\">1e-4</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">theta_deg</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"w\"> </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">180</span><span class=\"o\">/</span><span class=\"kc\">pi</span><span class=\"p\">,</span>\n<span class=\"w\">  </span><span class=\"n\">omega</span><span class=\"w\">     </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">omega</span><span class=\"p\">(</span><span class=\"n\">L</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">L</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">vel</span><span class=\"w\">       </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"o\">*</span><span class=\"n\">omega</span><span class=\"p\">,</span>\n<span class=\"w\">  </span><span class=\"n\">vel_mph</span><span class=\"w\">   </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">vel</span><span class=\"w\"> </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">3600</span><span class=\"w\"> </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">100</span><span class=\"w\"> </span><span class=\"o\">/</span><span class=\"w\"> </span><span class=\"m\">2.54</span><span class=\"w\"> </span><span class=\"o\">/</span><span class=\"w\"> </span><span class=\"m\">12</span><span class=\"w\"> </span><span class=\"o\">/</span><span class=\"w\"> </span><span class=\"m\">5280</span><span class=\"p\">,</span>\n<span class=\"w\">  </span><span class=\"n\">time</span><span class=\"w\">      </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">tt</span><span class=\"p\">(</span><span class=\"n\">L</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">L</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">y0</span><span class=\"w\">        </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"o\">*</span><span class=\"p\">(</span><span class=\"nf\">sin</span><span class=\"p\">(</span><span class=\"n\">theta</span><span class=\"w\"> </span><span class=\"o\">-</span><span class=\"w\"> </span><span class=\"m\">1</span><span class=\"p\">)),</span>\n<span class=\"w\">  </span><span class=\"n\">ymax</span><span class=\"w\">      </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">ymax</span><span class=\"p\">(</span><span class=\"n\">L</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">L</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"p\">),</span>\n<span class=\"w\">  </span><span class=\"n\">yratio</span><span class=\"w\">    </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">ymax</span><span class=\"o\">/</span><span class=\"n\">y0</span><span class=\"p\">,</span>\n<span class=\"w\">  </span><span class=\"n\">ymax_ft</span><span class=\"w\">   </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">ymax</span><span class=\"w\"> </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">100</span><span class=\"w\"> </span><span class=\"o\">/</span><span class=\"w\"> </span><span class=\"m\">2.54</span><span class=\"w\"> </span><span class=\"o\">/</span><span class=\"w\"> </span><span class=\"m\">12</span><span class=\"p\">,</span>\n<span class=\"w\">  </span><span class=\"n\">hit_ceil</span><span class=\"w\">  </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">ymax</span><span class=\"w\"> </span><span class=\"o\">&gt;=</span><span class=\"w\"> </span><span class=\"n\">C</span><span class=\"p\">,</span>\n<span class=\"w\">  </span><span class=\"n\">dy_dtheta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">dydtheta</span><span class=\"p\">(</span><span class=\"n\">L</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">L</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">theta</span><span class=\"p\">)</span>\n<span class=\"p\">)</span>\n</code></pre></div>\n\n<p>The results can be simply divided into two main categories: &ldquo;standard&rdquo; vs. &ldquo;extreme&rdquo; solutions.</p>\n<h3 id=\"standard-solutions\">Standard Solutions</h3>\n<p>I&rsquo;ve defined standard solutions as those for which (i) the dart doesn&rsquo;t hit the ceiling and (ii) <script type=\"math/tex\">\\theta \\ge 100^\\circ</script> to separate out the slightly crazier results.</p>\n<p>We find, intuitively, that the throw velocity must increase as the release angle approaches 90 degrees. The maximum height and air travel time always decrease as the trajectory flattens, but interestingly the required throw velocity has a minimum value. (In the figures I&rsquo;m plotting vs. <script type=\"math/tex\">\\varphi = 180^\\circ - \\theta</script>; <script type=\"math/tex\">0^\\circ</script> corresponds to a vertical upwards throw and <script type=\"math/tex\">180^\\circ</script> to a vertical downwards throw.)</p>\n<p><img src=\"/blog/2017/07/dart_physics_files/standard_plot-1.png\" width=\"672\" style=\"display: block; margin: auto;\" /></p>\n<p>For <script type=\"math/tex\">\\varphi < \\varphi_\\mathrm{min}</script> we have to throw the dart harder because much of its velocity is &ldquo;wasted&rdquo; traveling vertically. On the other hand, for <script type=\"math/tex\">\\varphi > \\varphi_\\mathrm{min}</script> we need more velocity to make it to the target before gravity can pull the dart too far. We could take a derivative to find <script type=\"math/tex\">\\varphi_\\mathrm{min}</script>, but since we&rsquo;re already here we can just find the minimum numerically:</p>\n<div class=\"codehilite\"><pre><span></span><code><span class=\"n\">opt_value</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"nf\">optimize</span><span class=\"p\">(</span><span class=\"nf\">function</span><span class=\"p\">(</span><span class=\"n\">.x</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nf\">omega</span><span class=\"p\">(</span><span class=\"n\">L</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">r</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">.x</span><span class=\"p\">)</span><span class=\"o\">^</span><span class=\"m\">2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"kc\">pi</span><span class=\"o\">/</span><span class=\"m\">2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"kc\">pi</span><span class=\"p\">))</span>\n<span class=\"m\">180</span><span class=\"w\"> </span><span class=\"o\">-</span><span class=\"w\"> </span><span class=\"n\">opt_value</span><span class=\"o\">$</span><span class=\"n\">minimum</span><span class=\"w\"> </span><span class=\"o\">*</span><span class=\"w\"> </span><span class=\"m\">180</span><span class=\"o\">/</span><span class=\"kc\">pi</span>\n</code></pre></div>\n\n<div class=\"codehilite\"><pre><span></span><code>## [1] 46.30904\n</code></pre></div>\n\n<p>The minimum is not <script type=\"math/tex\">45^\\circ</script>, but slightly larger (a flatter trajectory).</p>\n<h3 id=\"crazier-solutions\">Crazier Solutions</h3>\n<p>The first set of results not yet considered covers solutions approaching a perfectly horizontal throw. The travel time and maximum height again decrease (the maximum height decreases roughly linearly), but the throw velocity diverges: as <script type=\"math/tex\">\\varphi \\to 90^\\circ</script> the dart needs <script type=\"math/tex\">v \\to \\infty</script> to hit the bullseye before gravity pulls it off line.</p>\n<p><img src=\"/blog/2017/07/dart_physics_files/exotic_plot_1-1.png\" width=\"672\" style=\"display: block; margin: auto;\" /></p>\n<p>Finally, there&rsquo;s a slice of the solution for which we need a more vertical headroom:</p>\n<p><img src=\"/blog/2017/07/dart_physics_files/exotic_plot_2-1.png\" width=\"672\" style=\"display: block; margin: auto;\" /></p>\n<p>Once again we get crazy behavior with velocity, but now the time and maximum height are both extremely large &ndash; these plots are on a logarithmic scale. These solutions basically correspond to throwing a dart upwards and still managing to hit the bullseye. For some of these solutions our constant-<script type=\"math/tex\">g</script> approximation would be in big trouble!</p>\n<h3 id=\"margin-of-error-in-release-angle\">Margin of Error in Release Angle</h3>\n<p>Finally, what is our margin of error on dart throws? We can check by plotting <script type=\"math/tex\">\\delta y = dy/d\\theta \\, \\delta\\theta</script> and letting <script type=\"math/tex\">\\delta\\theta = 1^\\circ</script>.</p>\n<p><img src=\"/blog/2017/07/dart_physics_files/margin_error-1.png\" width=\"384\" style=\"display: block; margin: auto;\" /></p>\n<p>This plot suggests two things:</p>\n<ol>\n<li>There is an angle near 46 degrees where throws are less affected by errors in angle.</li>\n<li>The dart can miss by quite a bit for angles far from <script type=\"math/tex\">46^\\circ</script>: on the order of meters (this seems like quite a lot).</li>\n</ol>\n<h2 id=\"conclusion\">Conclusion</h2>\n<p>For the set up we considered, it turns out that there is an angle near <script type=\"math/tex\">\\varphi = 46^\\circ</script> for which the required velocity is a minimum. The dart throw is also most forgiving near that angle. There are also an interesting class of solutions that require fast dart throws, some of which would put the dart into orbit!</p>\n<p>This was a fun exercise &ndash; I was able to work the problem and do the computation in an evening.</p>\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:37-1\">\n<p>To simply the mental model, I&rsquo;ve actually mirrored our setup; we actually throw from right to left.&#160;<a class=\"footnote-backref\" href=\"#fnref:37-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n<li id=\"fn:37-2\">\n<p>I highly recommend the <a href=\"https://www.github.com/tidyverse/\">tidyverse</a> suite of R packages to simplify creating and working with data frames.&#160;<a class=\"footnote-backref\" href=\"#fnref:37-2\" title=\"Jump back to footnote 2 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2017-07-09T10:13:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2017/06/json-feed",
        "url":  "https://tshafer.com/blog/2017/06/json-feed",
        "title": "New Feeds",
        "content_html": "<p>This week I&rsquo;ve re-added an <a href=\"/blog/rss.xml\">RSS feed</a> and also created a new <a href=\"/blog/feed.json\">JSON feed</a>. These changes are mostly motivated by a desire to try out the new <a href=\"https://jsonfeed.org/\">JSON feed format</a> &ndash; it&rsquo;s pretty clever and was easy to implement.</p>",
        "date_published": "2017-06-25T23:30:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2017/06/installing-r-on-ec2",
        "url":  "https://tshafer.com/blog/2017/06/installing-r-on-ec2",
        "title": "Installing R on EC2 with RHEL 7",
        "content_html": "<p>I&rsquo;ve been tasked a few times recently to stand up AWS EC2 instances as shared data science/development platforms, including the <a href=\"https://www.r-project.org\">R</a> and <a href=\"https://www.rstudio.com/products/rstudio/#Server\">RStudio Server</a> stack. (I prefer RHEL 7 for familiarity.) R depends on <a href=\"https://fedoraproject.org/wiki/EPEL\">EPEL</a> for installation on top of RHEL, and adding EPEL to <code>yum</code> is <a href=\"https://superuser.com/a/956168\">pretty straightforward</a>:</p>\n<div class=\"codehilite\"><pre><span></span><code>$<span class=\"w\"> </span>sudo<span class=\"w\"> </span>yum<span class=\"w\"> </span>install<span class=\"w\"> </span>-y<span class=\"w\"> </span>epel-release\n$<span class=\"w\"> </span>sudo<span class=\"w\"> </span>yum<span class=\"w\"> </span>update<span class=\"w\"> </span>-y\n</code></pre></div>\n\n<p>Trying to <code>sudo yum install R</code>, however, still fails having not found the dependency <code>usetex-tex</code>. It was surprisingly difficult to track down a clean solution, but a <a href=\"https://superuser.com/a/1172152\">buried, not-accepted StackExchange answer</a> has it right: <code>usetex-tex</code> is listed in a disabled-by-default set of packages. Enable <code>rhel-server-optional</code> and we&rsquo;re in business:</p>\n<div class=\"codehilite\"><pre><span></span><code>$<span class=\"w\"> </span>sudo<span class=\"w\"> </span>yum-config-manager<span class=\"w\"> </span>--enable<span class=\"w\"> </span>rhui-REGION-rhel-server-optional\n$<span class=\"w\"> </span>sudo<span class=\"w\"> </span>yum<span class=\"w\"> </span>install<span class=\"w\"> </span>-y<span class=\"w\"> </span>texinfo-tex\n$<span class=\"w\"> </span>sudo<span class=\"w\"> </span>yum<span class=\"w\"> </span>install<span class=\"w\"> </span>-y<span class=\"w\"> </span>R\n</code></pre></div>",
        "date_published": "2017-06-17T21:53:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2017/06/update",
        "url":  "https://tshafer.com/blog/2017/06/update",
        "title": "An Update",
        "content_html": "<p>Hello &ndash; I&rsquo;ve finally put in the work to resurrect <a href=\"https://github.com/tomshafer/blog\">this blog</a>! I&rsquo;ve been working at <a href=\"http://www.elderresearch.com/\">Elder Research</a> for about 15 months touching a variety of technologies and need a place to put odds and ends.</p>",
        "date_published": "2017-06-14T23:20:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2016/01/summit-dpac-2015",
        "url":  "https://tshafer.com/blog/2016/01/summit-dpac-2015",
        "title": "Christmas at DPAC 2015",
        "content_html": "<p>This was tremendously fun.\nSo fun that it feels unfair to be a part &mdash; the video is fun to watch and listen to, but it can&rsquo;t capture the feeling of waiting in the wings for my cue while Branden and Molly break people&rsquo;s hearts on <em>Emmanuel</em>.</p>\n<p>It&rsquo;s also one of the precious few times each year we get to bring everyone together, playing music with friends from other campuses and making new ones, too.\nThis year was especially fun, being the only time we have played the four songs on the Carols EP live.\nSome musical highlights:</p>\n<ul>\n<li>Introduction</li>\n<li>Go Tell It On the Mountain</li>\n<li>Noel</li>\n<li>O Come, Emmanuel</li>\n<li>Vertical Church / Great Are You, Lord / Reprise</li>\n</ul>\n<iframe src=\"https://player.vimeo.com/video/149980989?title=0&byline=0&portrait=0\" width=\"500\" height=\"281\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>\n<p><a href=\"https://vimeo.com/149980989\">Christmas at DPAC 2015</a> from <a href=\"https://vimeo.com/summitrdusermons\">The Summit Church Sermons</a> on <a href=\"https://vimeo.com\">Vimeo</a>.</p>",
        "date_published": "2016-01-07T19:38:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2015/05/ten-thousand-fathers-album",
        "url":  "https://tshafer.com/blog/2015/05/ten-thousand-fathers-album",
        "title": "10,000 Fathers – Invitation, Volume One",
        "content_html": "<p>The <a href=\"http://www.10000fathers.org\">10,000 Fathers Worship School</a> run by <a href=\"http://www.aaronkeyes.com\">Aaron Keyes</a> released a new record today, and I really like it (you could purchase it from <a href=\"https://itunes.apple.com/us/album/invitation-vol.-1/id988006571\">iTunes</a> or <a href=\"http://www.amazon.com/Invitation-Vol-10-000-Fathers/dp/B00WH4BM3Q/ref=sr_1_1?s=dmusic&amp;ie=UTF8&amp;qid=1430587879&amp;sr=1-1&amp;keywords=10%2C000+Fathers\">Amazon</a>, listen on <a href=\"https://open.spotify.com/album/5pthnB7f8V3iOGOlp8LXJ2\">Spotify</a>, or stream the first few tracks for free at <a href=\"http://www.10000fathers.org\">their website</a>). I think I found out about the worship school from my good friend <a href=\"https://twitter.com/duanemixon\">Duane Mixon</a>, who has a <a href=\"https://itunes.apple.com/us/album/love-lifted-me-feat.-duane/id988006571?i=988006593\">track</a> on the record (and a good one at that), and another friend from <a href=\"http://portcitychurch.org\">Wilmington</a> has been a part of the school as well.</p>\n<p>Knowing a little of the heart behind the school and hearing the first track, <em>Invitation Song</em>, I was excited enough to preorder the record and as a result received it a week ahead of the release.  For me, highlights (or at least probably my most-listened tracks) are <em>Invitation Song</em> (also see the accompanying <a href=\"https://youtu.be/gNx10tHhpvU\">video</a>), <em>Rend the Heavens</em>, <em>Love Lifted Me</em>, and <em>Never Ending Love</em>.  A couple of these have already racked up some impressive numbers on my iTunes play counter &mdash; they&rsquo;re really good.</p>\n<p>But (surprisingly, at least to me) that&rsquo;s all a bit secondary. I enjoy the record, and the music and melodies certainly do it for me, but this record has already made a mark where relatively few do.  On <a href=\"http://www.10000fathers.org\">their website</a> announcing the launch is the line &ldquo;May the deep places in your heart be awakened to His reality all around and within you.&rdquo;  This record has already gone some distance in making that a reality.  As the first track says&hellip;</p>\n<blockquote>\n<p>Open up our eyes to see You in the ordinary,<br>\nWe don&rsquo;t want to miss You anymore<br>\nOpen every eye to see every day<br>\nEverything is burning with the glory of the Lord.</p>\n</blockquote>",
        "date_published": "2015-05-05T07:49:00-04:00"},
{
        "id":  "https://tshafer.com/blog/2015/02/christmas-dpac",
        "url":  "https://tshafer.com/blog/2015/02/christmas-dpac",
        "title": "Christmas at DPAC 2014",
        "content_html": "<p><a href=\"http://summitrdu.com\">The Summit Church</a>, our church here in Raleigh, has held Christmas Eve services at the <a href=\"http://www.dpacnc.com\">Durham Performing Arts Center</a> each of the last three years, and I&rsquo;ve had the great privilege of being a part of the <a href=\"http://vimeo.com/82645221\">most</a> <a href=\"http://vimeo.com/115371721\">recent</a> two with <a href=\"http://summitchurchmusic.com\">Summit Worship</a>.  I&rsquo;ve posted below the recording from the 2014 services &mdash; if you want a good time but aren&rsquo;t looking to invest an hour and a half, check out music director Branden&rsquo;s piano piece (30:55), <a href=\"https://brananmurphy.com\">Hank Murphy</a> and campus pastor Chris Green rapping<sup id=\"fnref:17-1\"><a class=\"footnote-ref\" href=\"#fn:17-1\">1</a></sup> (37:50), or pastor J.D.&rsquo;s message (44:16).</p>\n<p>If the Christmas Eve program piques your interest, a few good places to poke around are Summit&rsquo;s <a href=\"http://vimeo.com/summitrdusermons\">Sermons Vimeo channel</a>, <a href=\"http://www.summitrdu.com/messages/\">messages page</a>, and <a href=\"https://itunes.apple.com/us/podcast/the-summit-church/id616256661?mt=2\">podcast on iTunes</a>.</p>\n<div style=\"text-align:center; margin: 2em 0 3em 0;\">\n  <iframe src=\"http://player.vimeo.com/video/115371721?title=0&byline=0&portrait=0\" width=\"600\" height=\"337\" frameborder=\"0\" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>\n</div>\n\n<div class=\"footnote\">\n<hr>\n<ol>\n<li id=\"fn:17-1\">\n<p>Hank&rsquo;s song, <em>Messiah</em>, is available on <a href=\"https://itunes.apple.com/us/album/messiah-single/id943498172\">iTunes</a> and <a href=\"http://open.spotify.com/album/3ibQUVtVvhAER5llKTHs0i\">Spotify</a>.&#160;<a class=\"footnote-backref\" href=\"#fnref:17-1\" title=\"Jump back to footnote 1 in the text\">&#8617;&#xfe0e;</a></p>\n</li>\n</ol>\n</div>",
        "date_published": "2015-02-06T23:27:00-05:00"},
{
        "id":  "https://tshafer.com/blog/2014/09/hello-world",
        "url":  "https://tshafer.com/blog/2014/09/hello-world",
        "title": "Hello, World.",
        "content_html": "<p>Welcome to <a href=\"/blog/\">my blog</a>!  I really didn&rsquo;t think I&rsquo;d have one again, but I&rsquo;ve been getting the itch recently and it was a fun programming exercise.</p>\n<p>I was particularly inspired to write again by Brent Simmons and his blog <a href=\"http://inessential.com/\">Inessential</a> &mdash; I love his short, undecorated style.  It takes the pressure off. I&rsquo;m not an expert in very much, and I hope the tone here reflects that.  I probably won&rsquo;t write often (I feel like I don&rsquo;t have all that much to say), but I wanted a place that was my own on the internet again.  If I do write anything here, expect posts about programming, physics, and music with some family stuff thrown in.</p>\n<h2 id=\"about-this-site\">About this Site</h2>\n<p>Part of the fun of blogging is coming up with a <a href=\"http://en.wikipedia.org/wiki/Rube_Goldberg_machine\">Rube Goldberg contraption</a> for posting.  To make this blog go, I wrote a blogging engine in Python.  It is mostly a port of <a href=\"http://marco.org\">Marco Arment</a>&rsquo;s <a href=\"http://www.marco.org/secondcrack/\">Second Crack</a> static engine (hosted <a href=\"https://github.com/marcoarment/secondcrack\">here</a>).  It uses <a href=\"http://dropbox.com\">Dropbox</a> and <a href=\"https://github.com/rvoicilas/inotify-tools\">inotify-tools</a> in pretty much the same way as Second Crack to automate posting.  I chose Python because I&rsquo;m much more familiar with it these days (second only to <a href=\"http://en.wikipedia.org/wiki/Fortran\">Fortran</a>) than <a href=\"http://eev.ee/blog/2012/04/09/php-a-fractal-of-bad-design/\">PHP</a>.</p>\n<p>Putting the blog together was a fun weekend-long break from <a href=\"http://tshafer.com/research/\">physics programming</a>, and the engine is still rough around the edges.  The styling needs some work, and I <del>don&rsquo;t have an RSS feed yet</del> (<strong>Update 11/9/14:</strong> I now have an RSS feed <a href=\"http://tshafer.com/blog/rss\" title=\"RSS feed\">here</a>!).  I don&rsquo;t have archives yet or any kind of tagging, but they&rsquo;ll come eventually.  I don&rsquo;t have any kind of commenting, either, and I doubt I will.  These days Twitter (I&rsquo;m <a href=\"https://twitter.com/tomshafer\">@tomshafer</a>) is mostly the rage.</p>\n<p>This represents a (mostly) clean break for me.  I&rsquo;ve written before, but many of those posts represent the kind of writing I want to avoid.  Martha and I have a lot of fun things coming in the next year or two and this seems like a good time to at least <em>have the option</em> of blogging.  I might sneak a few old posts in at some point, but the future is the fun part.</p>",
        "date_published": "2014-09-14T23:20:00-04:00"}]
}