Author: Adam Jon Richardson

  • Simple Linux (Ubuntu) Security

    Simple Linux (Ubuntu) Security

    I’m setting up a Bitcoin full node to perform some analyses on the full blockchain. Here are quick steps I followed to improve the security of a basic Ubuntu box for these purposes [1]. This is not meant to be an exhaustive list! Rather, this should serve as a general approach that helps you think about how you can continue to improve and monitor the security of the systems you’re using. And, while I’m using Ubuntu, these general principles apply to other Linux distros, too.

    1. Step 1: Use What You Know
    2. Step 2: Run The Bare Necessities
    3. Step 3: Filter Communications
    4. Step 4 – ∞: Update & Monitor
    5. Final Step: I Give You… THE INTERNET!

    Step 1: Use What You Know

    When I was younger, I remember someone suggesting to my father that he could use a simpler tool to test a TV component instead of the oscilloscope he was already using. However, he promptly replied, “I like to use what I know best,” and he just kept using the oscilloscope. This wisdom applies to the security realm, too.

    My first step towards configuring a full node involves ditching Windows. This shouldn’t be a interpreted as an indictment against the possible security of Windows. Rather, I don’t know Windows security as well as I know Linux security, and I should stick with what I know best. And, I chose Ubuntu because I’ve used this distro the most over the past year. [Note to future self: remember that for some reason burning DVDs at any of the faster speeds on my iMac leads to images that can’t be mounted 🙁 ]

    Step 2: Run The Bare Necessities

    Sure, with Ubuntu, you can have it all… but just know ALL includes possible vulnerabilities, too. When I install Ubuntu for this type of purpose, I first start by installing the “Minimum Install,” an option that some other distros offer, too. Once installed, I follow the steps below to remove more unneeded stuff (“stuff” is a technical term that is often under-appreciated, but not here, my friends, not here.)

    1. Run the command below to view the running processes:
      # sudo lsof -i
    2. Identify a process that seems unneeded (e.g., cups, avahi-daemon, etc.)
    3. Remove the package using the command below [2]:
      # sudo apt-get autoremove cups
    4. Restart the computer and make sure there are no errors.
    5. Rinse and repeat until the song above plays like an anthem when I view the processes.

    Step 3: Filter Communications

    Angry and upset teenager screaming at the mobile telephone

    I’ve learned to filter communications on social media with some people so I can avoid drama… not you, of course [note to self…]. In the same way, helping computers filter communications can help limit your system administration drama.

    I’ve learned to love iptables, and so can you. As in any relationship, it first starts by listening… figuratively and literally. Often, when you run the command below, you’ll see that iptables is overly ACCEPTing in terms of communications (oh, Ubuntu, how quaint):
    # sudo iptables -S

    This is one situation in which it’s preferable NOT to be accepting. In fact, in order to prioritize security, we start by blocking everything, and then we explicitly accept what we absolutely need (e.g., port 80 for apt-get, port 53 for DNS, etc.) There are plenty of tutorials on iptables out there (e.g., here, here, and here), but hopefully these principles point you in the general direction so you can get started [3].

    Steps 4 – ∞: Update & Monitor

    Mother Hand holding video baby monitor for security of the baby

    New parents know the joy of bringing home a new baby to the nursery that they’ve carefully prepared. Every little detail in the room has been painstakingly considered. New parents also quickly realize that the easy work was preparing the room. Every new day going forward requires constant updates and monitoring (e.g., new diapers, new clothes, new furniture, new diap… whoops, sorry, I was dreaming of sleep and forgot I already said… zzzzzzzzzz…)

    In the same way, our precious new Ubuntu box will require constant updates and monitoring. If you study pen-testing techniques, you’ll quickly realize a common theme: exploits are old! Unless you’re up against an incredibly brilliant hacker or powerful nation state, the hackers attacking your system are searching for vulnerabilities in software that have already been patched. If you diligently update your system, you can avoid the vast majority of exploits that could be used against your baby. It’s simple:
    # sudo apt-get update
    # sudo apt-get upgrade

    Finally, because you know your little baby better than anyone else, because you’ve limited your system to the bare necessities, because you’ve filtered your communications, and because you’ve diligently updated your system, you can sometimes identify zero-day exploits (and mitigate them) before there’s even a patch (all zero-days have to transition to one-days somehow, and you could be the one!) Monitoring the logs and noticing suspicious activity on a well-secured system is much easier because there is less to see: nefarious behavior that is new will stand out. Stay vigilant, my friends!

    Final Step: I Give You… THE INTERNET!

    You’re ready to let your little baby go out into the world [4]. Fair thee well!

    All-Time Classic!

    Footnotes

    1. You will often hear of hardening a system to reduce the attack surface for a specific use. I hate this term, because a “hardened” system soon becomes soft and squishy if it’s not actively updated and monitored and improved. Remember, security is a process, not a product.
    2. There are various ways to remove packages using apt-get. I use the autoremove option because it automatically removes associated dependencies, too, if they are otherwise unused.
    3. Remember, rules are ephemeral unless you take steps to make them persistent. New users are often surprised to see their configuration work disappear when they next reboot. I like to use a combination of the iptables-persistent package and the iptables-save command to make my rules persistent.
    4. There are certainly other things one can (and should) do, such as contain the possible damage apps/processes can do to the rest of the system, but I’ll cover these approaches in a later post.

  • Mnemonic For OSI Model Layers

    Mnemonic For OSI Model Layers

    There are a million acronyms out there to help students remember the layers of the Open Systems Interconnection (OSI) model. Why offer another? Well, this one has worked best for me.

    As a reminder, the OSI Model Layers are as follows:

    • Layer 7: Application
    • Layer 6: Presentation
    • Layer 5: Session
    • Layer 4: Transport
    • Layer 3: Network
    • Layer 2: Data Link
    • Layer 1: Physical

    AP(p)S Transport Network Data Physical(ly)

    I like this because it helps me remember several important pieces of information in one tidy little phrase.

    1. The Transport, Network, Data (link), and Physical layers are spelled right out.
    2. Grouping Application, Presentation, and Session together into the acronym AP(p)S helps group the upper layers together, and it conveys that these layers are of primary concern for the application-level data.
    3. The sequence of the operations in terms of wrapping the layers starts with the top (layer 7: App.) All to often, I hear people say they are confused about the layers because layer one seems like it should be wrapped by the other layers. This mnemonic helps avoid that issue. Additionally, the layers read naturally from top to bottom when I see them represented, and this order makes writing them down more natural.
    4. Finally, the phrase reminds me that these abstractions all rest on a physical communication layer (i.e., when troubleshooting, start there first!)

    If this helps you, great, and if not, I hope you find a mnemonic that works well for you 🙂

  • Finding Big M: Iteratively Estimating The Mean

    Finding Big M: Iteratively Estimating The Mean

    Several months ago I wanted to estimate the mean of a value for users of an online app as quickly as possible with as few samples as possible. Every data point required scraping a webpage, and this proved timely AND costly in terms of system resources. Additionally, if I triggered too much traffic with my queries, the host would temporarily block the IP of my server.

    Having theorized that the distribution was normal, I considered several more formal approaches (e.g., estimate power and then choose the appropriate sample size; sequential sampling; etc.) However, I was curious if I could develop an iterative approach that would both satisfy my precision requirements and help me get the data I needed as efficiently as possible given the cost of each web-scraping request.

    Big M: A Generally Precise Estimate Of The Mean

    While I didn’t need to publish the means I was estimating in scholastic journals, I wanted to ensure that the estimates were provably reliable in the general sense. That is to say, I wanted confidence intervals “in the nineties” or “in the eighties.” I wasn’t trying to disprove any formal null hypotheses, but I did want good data. I was training machine learning models for a class, and I wanted the most predictive models possible given my resource limitations.

    I decided to play around with an iterative approach to generating a sample of large enough size to achieve the general degree of precision that I desired (even the thought of this would probably make my grad school stats professors throw up in their mouths.) This type of approach is a “no no” in statistics books, as you can generally grow a sample until you temporarily get what you want. You usually want to make informed a priori decisions about your research and then follow them to find your robust results.

    However, I’d played around with Monte Carlo simulations a couple decades ago (I’m so old), and I always found it interesting how well various methods generally held up even in the face of violations of assumptions. Additionally, the ability of machine learning models to consistently converge on valid findings even in the face of crude hyperparamters has taught me to put things to the test before discounting them.

    I set out to make a (relatively) simple algorithm for estimating the mean of the population to a general (e.g., “around ninety out of one hundred samples will contain the mean of the population with the given error constraint.”) I called this estimate Big M because, well, you know, D. Knuth is awesome, and Big O conveys a form of general precision that I wanted to embrace. If I’m working with big data, I don’t need a scientifically chosen set of samples that should guarantee a CI of 95%, I just need to know that I’m generally in the 90s.

    The Big M Algorithm Overview

    After trying out various forms of the algorithm, I developed the following approach, and a quick example Jupyter notebook is linked to containing code and several random results. Essentially, the code integrates the following algorithm.

    1. Select initial sample.
    2. Compute the confidence interval (CI) and mean.
    3. Check if the acceptable error (e.g., off by no more than 2 inches) is outside the confidence interval.
      1. If the acceptable error is outside the CI, increment the count of valid findings.
      2. If the acceptable error is inside the CI, start the count over at zero.
    4. Check if the continuous count has reached its threshold.
      1. If the continuous count has reached its threshold, exit the function with the mean estimate.
      2. If the continuous count has not reached its threshold, add one more observation to the sample and return to step 2.

    There are some other details of note. The continuous count formulation is based on the ratio of the population size to the initial sample size and the confidence percentage chosen. Additionally, there is a maximum sample size that is formulated automatically by the size of the population and the initial sample size, though this parameter can be set manually.

    The Big M Jupyter Notebook Example Code (Embedded as HTML)

    I’ve linked to a demonstration of the code to implement the algorithm. Much of this is arbitrary, and I’m sure you could refactor this so it performs better. I’ve since developed a server-side implementation that’s quite different from this in a language other than Python, but this should (barring bugs, which obviously will be present) capture the general thrust of the algorithm and show general results (you can just rerun the Notebook to see new results.)

    Big M Jupyter Notebook at GitHub

    YMMV 🙂

  • Monitor Network Traffic on an iOS Device Using Remote Virtual Interface

    Monitor Network Traffic on an iOS Device Using Remote Virtual Interface

    There are several situations in which one may want to monitor network traffic on an iOS device (e.g., ensuring there is no unexpected network traffic, identifying the APIs utilized by various apps, etc.) Let’s look at one possible option to accomplish this. From iOS 5 on, we can use Remote Virtual Interface (RVI) to add a network interface to your macOS device that contains the packet stream from an iOS device.

    Install Xcode From The App Store

    First, ensure that you’ve installed Xcode from the App Store on the Mac you’ll be using. It’s free, and it’s a straight-forward install.

    Screen capture of search for Xcode
    Find and install Xcode from the Apple Store.

    Install Xcode Command Line Tools

    Next, make sure you have the command line tools for Xcode installed on your system. You can type the following command to check if they are installed:

    $ xcode-select --version
    Screen capture of running command in terminal.
    Ensure that the Xcode command line tools are installed.

    If you don’t see any version information and you get a “command not found” type of error, you can use the following command to install the tools:

    $ xcode-select --install

    Of note, don’t try to use the same command above to update your installation of the command tools, just let Apple prompt you for an update (or, if you have automatic updates enabled, updates should happen without you needing to do anything.)

    Connect Your iOS Device To Your Mac Computer

    Then, connect your iOS device with your Mac computer using whatever wired connection is required (for my iPhone 8 and my iMac, I’m using a USB-to-Lightning cable.) Once connected, you just need to have both devices turned on so they can talk to each other (you may have to enter the passcode for your iOS device to unlock it.)

    Start Xcode And Find Your UDID

    Next, we have to locate the Unique Device Identifier (UDID) for your iOS device. The easiest way to do this (and have something you can copy into your command for the next step) is to use Xcode. After starting Xcode, you can navigate to the Window menu and then select Devices and Simulators. That will bring up a new window, then you can select the Devices tab, which should reveal detailed information about your iOS device. For our purposes, we need the value after the Identifier label (blurred out in my image below), which is the UDID for the device.

    Screen capture of opening the devices tab in Xcode.

    Find The “rvictl” Command On Your Mac

    Now we need to open the terminal again. First, we have to find where the RVI command is located on your version of macOS. The find command can do this nicely, and we’ll enhance our command so we don’t see hundreds of permission denied messages.

    $ find / -name "rvictl" 2>/dev/null

    The output should reveal the location of the command. On my iMac running Catalina, the location is /Library/Apple/usr/bin, but make sure you check your system for the precise location.

    Next, change to the directory of the rvictl binary and then run the command.

    $ cd /Path/On/Your/System
    Screen capture of running command in terminal.

    Run The “rvictl” Command To Add Your iOS Device As A Network Interface

    Finally, we can run the rvictl command and pass in the UDID we found earlier for our iOS device to start up a new network interface that will allow us to monitor the network traffic on the device using our Mac computer.

    $ rvictl -s the-udid-number-of-your-ios-device
    Screen capture of running command in terminal.

    Test The Network Interface With tcpdump

    Now that the network interface has been configured on your Mac for your iOS device (usually called rvi0), let’s test it to ensure that it’s working. Try using tcpdump to view HTTP activity on your iOS device and then visit a webpage on your phone that is using HTTP (not HTTPS.)

    $ tcpdump -i rvi0 port http
    Screen capture of running command in terminal.

    Take Aways

    You should now have the ability to configure your Mac computer to monitor network traffic on your iOS device. There are pros and cons to this particular approach. On the positive side, it is relatively easy if you’re using a Mac, unencrypted traffic is easily viewed, and the required applications/tools are few. However, if you you don’t own a Mac computer, or if you need to view encrypted traffic (e.g., HTTPS), there are better approaches. I’ll cover other monitoring options in the future that address these issues.

  • Xubri Educational Resources

    Xubri

    One of my companies, StartingStrong.com, started a line of educational resources to help students finish good practice fast: Xubri. Now that the Xubri trademark has been registered, I’m going to start creating more educational resources under the Xubri name.

    You might wonder why we waited for the trademark to be finalized before investing the time to create more resources. Well, I had a bad experience where I’d worked hard to build up the presence of an app in the Apple App Store. Then, someone created an app with the same name… except that they added “HD.” Seriously, they just called their app “name-of-my-app HD.” The similarly-name “HD” product seriously undermined my brand and advertising. Lesson learned!

    So far, the brand includes several basic math facts apps in the Apple App Store and an audio single. However, there are several new apps being actively developed, and we’re really excited for what the future holds!

     

  • Dusted-Off Version of 1944 “Chart of Electromagnetic Radiations”

    Dusted-Off Version of 1944 “Chart of Electromagnetic Radiations”

    Charts are fantastic! And, on the rare occasion that you find an old chart that possesses the charm of a previous era whilst maintaining valuable insights for today’s learners, you’ve found a true treasure.

    A few years back, I’d looked for a chart covering electromagnetic radiation for my children, and I’d found some really nice options. One stood out, and while I’m not the only one who liked it, I never pulled the trigger. Again, it’s a very nice poster with nice reviews on Amazon, but it didn’t compel me to spend my money immediately.

    Advancing to the current year, I realized my daughters were getting to the age where putting off getting the poster was no longer an option (at least in dad’s eyes,) so I figured I’d take another look to see what I could find. Eventually, I came across an old chart posted to Lawrence Livermore National Laboratory’s (LLNL) flickr account. Quoting the image description:

    If you’re into scientific antiques, you have to examine the details in this 1944 poster from the W.M Welch Scientific Company: “Chart of Electromagnetic Radiations.” It was found tucked away in the back of an unused office years ago, but now hangs framed in a high-traffic hallway populated by Lawrence Livermore engineers.

    Small image of the Chart of Electromagnetic Radiations from from the W.M Welch Scientific Company in 1944

    What a marvelous poster. Certainly, this was the poster for which I’d been waiting. Beyond the beautiful presentation, someone had also worked up a nice writeup of the chart’s provenance, which only added to its allure. Sure, Edward Tufte may not have installed this particular chart in his house, but even he would have to concede the impressive level of information density achieved.

    Really, it looked like all systems were go. It’s licensed under Creative Commons 2.0, so getting this beautiful chart printed as a large-size poster would be a snap. All I had to do was go to FedEx Office to get some pricing and then quick talk to my beautiful wife. Easy peasy.

    Although FedEx had some reasonable pricing, apparently the discussion with the wife posed a greater stumbling block than I had anticipated. Couldn’t she see the beauty of this poster? Why couldn’t we put this baby up on one of our walls as big as it could be printed?

    After much bargaining, she agreed to let me put up a poster if I improved the appearance of the chart (it looked old, worn, and dusty to her), and we limited the largest side to 36 inches.

    So, after putting in some time in Photoshop, I have a “dusted-off” version of the chart ready for printing. I tried to limit the edits to repairs related to color fading, and some extreme rips, as I thoroughly appreciate the aged appearance of the chart.

    Chart of Electromagnetic Radiations after being "dusted off" with Photoshop.

    Following the license of Lawrence Livermore National Laboratory’s original image upload, this updated version is also licensed by the Creative Commons 2.0 license.

    Here’s the link to the full-size image as a JPEG (10000 x 6945 pixels, 50.2 MB.) Enjoy!

    Finally, if you make a better version of the chart, I’d love it if you linked to it in the comments so I can negotiate for a larger poster 😉

    Update January 19th, 2017:

    Ryan from the IEEE History Center commented with a link to another nice writeup on this chart in the January 2017 edition of the Journal of the Caxton Club (starts on page 10.)

    Update November 9th, 2021:

    I apologize for the link to the full-size image having permission issues. Google changed their permissions, but I believe I have updated the link and you should again be able to download the chart without having to request permision.

  • Words [Not] to Capitalize in AP-Styled Titles: Don’t Mess With a Mouse

    To capitalize or not to capitalize: that is the question for any writing that involves titles or headings. There are several different styles to consider, such as AP, MLA, sentence-styled, etc. And, there are many great resources that detail the pros and cons of the various capitalization approaches, such as Grammar Girl’s article on capitalizing titles. I’m not here to continue this analysis, as I’ve made my choice: AP-styled capitalization is my strong preference. No, I’m here to discuss how to remember the rules for this particular style choice.

    The rules aren’t that hard to apply, and essentially boil down to the following:

    1. Capitalize any word that starts a title.
    2. Capitalize any word that ends a title.
    3. Capitalize any other word in the title as long as it’s not in this list:
      • a
      • an
      • and
      • at
      • but
      • by
      • for
      • in
      • nor
      • of
      • on
      • or
      • so
      • the
      • to
      • up
      • yet

    Rules 1 and 2 are quite easy to remember. However, rule 3 is difficult because it’s easy to forget the list of exception words.

    One can try to remember that the list includes all articles (a, an, the), conjunctions (and, but, for, nor, or, yet), and prepositions three or less characters in length (at, by, for, in, of, off, on, to, up.) Honestly, working through the sets and elimination rules is cumbersome, especially if you haven’t had to apply them in a while.

    So, I wrote a [clumsy] limerick to serve as a mnemonic device to help remember which words are NOT capitalized (exception words in the middle of a title) when writing AP-styled titles or headings. In this limerick, all words with three characters or less are the exception words. Some of the rhythms have to be finessed, but hey, there are 17 exception words to fit in, and we can’t all be Edward Lear.

    Don’t Mess With a Mouse

    So a mouse near an eagle and a snake,
    Keeps in watch on the shore of a lake.
    Yet at night, up to pounce,
    But don’t judge by might or ounce,
    For neither bird nor serpent would awake.

    I’ve posted it here as a reference for me and my school-aged daughters (and, more recently, my daughters produced a brief video to tell the tale.) That doesn’t mean they like it (my eldest daughter hates the fact that the bird dies, although she’s OK with the snake perishing,) but, pragmatically speaking in terms of helping us remember AP-styled title capitalization, it works for us 🙂

  • Automatically Generating Content Inventories (Part 1)

    Introduction

    I’ll admit it, in my youth (say, a few days ago) I’d often generate a content inventory by hand. I’d simply open a new spreadsheet and start working my through the site until I was done chronicling the content. I chose this path because of its simplicity and because many of the websites I work on are quite small.

    This month I’m working with a client on several sites, and the total number of pages is close to one thousand. Sure, I’ll likely still want to view each of the pages just in case the title and description fail to reflect the content (or it’s an asset that lacks this meta information), but automatically generating the url, file type, title and description should save a tremendous amount of time.

    To automatically generate a content inventory, we’ll break the work up into three steps:

    1. Create a local copy of the website (covered in this post.)
    2. Create a list of broken links (covered in this post.)
    3. Parse the local files to create a spreadsheet (covered in the next post.)

    Using Wget To Create A Local Copy Of Your Website

    The GNU wget package makes it very easy to generate a local copy of a website. You can use it to crawl your entire website and download all of the linked assets (html files, images, pdf’s, etc.) While you can install wget on Windows and Macs, when I’m using one of these systems I just run a VM of my favorite Linux distro, which already has wget installed. I found a great tutorial that demonstrates how to create a mirror of a website with wget, and it’s most basic usage is illustrated by the command below.

    
    $ wget -m http://www.site.com/
    

    There are many more options, but the command above would create the directory “www.site.com” and put all of the linked files from your website in that directory.

    Using Wget To Find Broken Links (404)

    Next, let’s make sure we have a list of the broken links in the website. After all, a content inventory is supposed to guide future work, and all future work should take into account content that’s either missing or unfindable.

    Again, making use of wget greatly simplifies this task, and I found another great tutorial that outlines using wget to find broken links. The basic command structure is listed below.

    
    $ wget --spider -o file.log -r -p http://www.site.com
    

    Once completed, you have a file that you can grep / search for occurrences of 404 errors.

    A Bash Script To Automate Simplify Things

    Of course, I’m old and I forget things easily. I can’t be expected to remember these commands for the next five minutes, let alone the next time I’m creating a content inventory a month from now. Additionally, instead of using multiple calls to wget, we can merge these operations into one roundtrip. Here’s a simple bash script that automates the creation of the local mirror of the website and the log file with broken link information.

    
    #!/bin/bash
    
    # remember to run chmod +x myFileNameWhateverItIs
    
    # store domain
    echo "Enter website domain (e.g., www.site.com):"
    read domain
    # store url
    url="http://$domain"
    # system status
    echo "Creating mirror..."
    # create local mirror
    wget -m -w 2 -o wget.log -p $url
    # system status
    echo "Creating broken link log..."
    # store broken link(s) info
    grep -n -B 2 '404 Not Found' wget.log > wget-404.log
    # system status
    echo "Process completed."
    

    If I store the code above in the file “local-site.sh” (and call chmod +x on it), I can call it directly to create a local copy of the website and a log file containing broken links:

    
    $ ./local-site.sh
    > Enter website domain (e.g., www.site.com):
    > www.example.com
    > Creating mirror...
    > Creating broken link log...
    > Process completed.

    I’ll cover parsing of the local files to create a content inventory spreadsheet in the next post.

  • Isolating Side Effects Using Isolation Sets

    A program or function is said to have side effects if it impacts the system state through a means other than its return value or reads the system state through a means other than its arguments. Every meaningful program eventually requires some form of side effect(s),  such as writing output to the standard output file-stream or saving a record to a database. That said, working with pure functions, which lack side effects and are consistent, has many advantages. How can the practical necessity of side effects be amended with the benefits of avoiding them?

    Your Special Island

    If a program’s side effects are isolated in a small, known subset of the codebase, we can reap the benefits of working in their absence throughout large sections of the codebase whilst providing their practical application when needed. Indeed, functional programming languages like Haskell facilitate this approach by isolating side effects directly through language features / limitations. But what about the many languages that don’t directly facilitate side effect isolation, how can we achieve the same effects?

    We Will All Go Down Together

    Let’s begin with a typical example involving a non-isolated side effect. We’ll work through a small PHP function for sending email that resembles countless other examples online.* Because the side effect (the call to the mail function) is not isolated, the entire function is impure, making it all very difficult to test.

    
    <?php
    function sendSalesInquiry($from, $message)
    {
      // validate email
      if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
        return "<p>Email address invalid.</p>";
      }
      // init vars
      $to = "sales@company.com";
      $subject = "Sales Inquiry";
      $headers = "From: $from';
      // attempt to send
      if (mail($to, $subject, $message, $headers)) {
        return "<p>Email successfully sent.</p>";
      } else {
        return "<p>Email delivery failed.</p>"; 
      }
    }
    ?>
    

    And They Parted The Closest Of Friends

    To isolate the side effect, we’ll add some all-powerful indirection by refactoring the email function into multiple functions. Using a combination of a potentially-pure function with two fall-through functions allows us to easily, cleanly isolate the side effect in this example. When using this combination of function types specifically to isolate side effects, I refer to them collectively as an isolation set.

    <?php
    // potentially-pure function
    function sendSalesInquiry($from, $message, $mailer)
    {
      // validate email
      if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
        return "<p>Email address invalid.</p>";
      }
      // init vars
      $to = "sales@company.com";
      $subject = "Sales Inquiry";
      $headers = "From: $from';
      // attempt to send
      if ($mailer($to, $subject, $message, $headers)) {
        return "<p>Email successfully sent.</p>";
      } else {
        return "<p>Email delivery failed.</p>";
      }
    }
    // fall-through function provides implementation
    function sendSalesInquiryMail($from, $message)
    {
      // call potentially-pure function passing in mailer
      return sendSalesInquiry($from, $message, $mailer = function($from, $message, $headers) {
        return mail($from, $message, $headers);
      });
    }
    ?>
    

    The original example has been refactored into one potentially-pure function to handle the logic and initialization; and two fall-through functions, one to encapsulate the side effect, and one to provide the default behavior (in this case the mailer function) for production.**

    When testing the code, the sendSalesInquire() function becomes the natural entry point, as it contains all of the important logic and initialization to be tested. Because the function is potentially-pure, passing in pure arguments causes the function to behave like a pure function, yielding better testing and clarity.

    Music Left To Write

    Although the example only dealt with one side effect, an isolation set can be used to isolate to any number of side effects. We could extend the example above and add a spam-checking algorithm. We’d just have to add another fall-through function for the side effect.

    <?php
    // potentially-pure function
    function sendSalesInquiry($from, $message, $mailer, $isSpam)
    {
      // validate email
      if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
        return "<p>Email address invalid.</p>";
      }
      // check for spam
      if ($isSpam($from, $message)) {
        return "<p>Don't call us, we'll call you.</p>";
      }
      // init vars
      $to = "sales@company.com";
      $subject = "Sales Inquiry";
      $headers = "From: $from';
      // attempt to send
      if ($mailer($to, $subject, $message, $headers)) {
        return "<p>Email successfully sent.</p>";
      } else {
        return "<p>Email delivery failed.</p>";
      }
    }
    
    function sendSalesInquiryMail($from, $message)
    {
      // call potentially-pure function passing in 
      return sendSalesInquiry(
        $from,
        $message,
        $mailer = function($from, $message, $headers) {
          return mail($from, $message, $headers);
        },
        $isSpam = function($from, $message) {
          $spamChecker = new SpamChecker();
          // this analysis could involve any number of database queries, networking requests, etc.
          return $spamChecker->isSpam($from, $message);
        }
      );
    }
    ?>
    

    It’s Nine O’Clock On A Saturday

    What? Doesn’t getting your side effects isolated put you in a mood for a melody?

    * I’m not enamored with returning HTML markup in this type of function, but it represents a common example I found online, and it’s for a programming language that people don’t typically associate with functional programming practices, so the example works well for the purposes of the current demonstration.

    ** You could reduce this example to two functions, as the potentially pure function could be used to contain default values for the fall-through function(s), which could then be overridden by passing in an argument for testing purposes. However, I like the clarity granted by implementing an isolation set with three functions, as I want to avoid marrying the potentially pure function to any default implementation. For example, I could easily provide a different mailing mechanism by merely creating a new function, like sendSalesInquirySMTP(), which provides a PHPMailer implementation.

  • Potentially-Pure Functions

    Overview: Potentially-pure functions are argument-based higher-order functions (i.e., functions that accept other functions as arguments) with pure function bodies (i.e., function bodies that are consistent and side-effect free), meaning their purity is dependent upon the arguments passed into the function.

    Higher-Order Functions

    All potentially-pure functions are higher-order functions, so let’s begin with a brief overview of what it means to be a higher-order function.

    Higher-order functions accept functions as arguments (we’ll call this specific form argument-based higher-order functions) or return functions as values (we’ll call this specific form return-based higher-order functions.) Higher-order functions enable tremendous power, flexibility, and parsimony; and they are leveraged heavily in functional programming.

    In order to implement argument-based higher-order functions, a programming language must allow you pass functionality into functions through their arguments. While not all languages provide first-class functions, which can be passed around and stored like other data, you can effectively emulate first-class functions in most languages. In low-level languages like C, you can pass in function pointers; in OOP languages like Java, you can pass in interfaces; and in dynamic languages like PHP which used to lack anonymous functions (prior to version 5.2), you can pass in the string name of an existing function. No matter what language you’re using for your development, you should be able to fake it quite convincingly.

    Potentially-Pure Functions

    Pure functions are side-effect free and consistent. Higher-order functions provide a special situation when evaluating purity. If an argument-based higher-order function’s body is pure, then its purity is unfixed. In other words, the purity of the function is dependent upon the purity of the functions passed in as arguments. If the functions passed into the higher-order function are pure, then the function is pure; and if the functions passed in are impure, then the higher-order function is impure*. Because the phrase “argument-based higher-order function with unfixed purity” might jeopardize your conscious state, let’s just call this function type a potentially-pure function.

    The natural duality of potentially-pure functions makes them especially helpful when it comes to isolating side effects. When testing a potentially-pure function, a pure form of the function can be passed in, allowing you to cleanly and easily test all possible states. When using a potentially-pure function in production, a fall-through function containing the side effect(s) can be passed in, allowing the code to perform its real-world requirements.

    * If an impure function is passed to an argument-based higher-order function with unfixed purity, it is possible for the function to remain pure if the impure function passed in as an argument is never called.