Isolating Side Effects Using Isolation Sets

A program or function is said to have side effects if it impacts the system state through a means other than its return value or reads the system state through a means other than its arguments. Every meaningful program eventually requires some form of side effect(s),  such as writing output to the standard output file-stream or saving a record to a database. That said, working with pure functions, which lack side effects and are consistent, has many advantages. How can the practical necessity of side effects be amended with the benefits of avoiding them?

Your Special Island

If a program’s side effects are isolated in a small, known subset of the codebase, we can reap the benefits of working in their absence throughout large sections of the codebase whilst providing their practical application when needed. Indeed, functional programming languages like Haskell facilitate this approach by isolating side effects directly through language features / limitations. But what about the many languages that don’t directly facilitate side effect isolation, how can we achieve the same effects?

We Will All Go Down Together

Let’s begin with a typical example involving a non-isolated side effect. We’ll work through a small PHP function for sending email that resembles countless other examples online.* Because the side effect (the call to the mail function) is not isolated, the entire function is impure, making it all very difficult to test.


<?php
function sendSalesInquiry($from, $message)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if (mail($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>"; 
  }
}
?>

And They Parted The Closest Of Friends

To isolate the side effect, we’ll add some all-powerful indirection by refactoring the email function into multiple functions. Using a combination of a potentially-pure function with two fall-through functions allows us to easily, cleanly isolate the side effect in this example. When using this combination of function types specifically to isolate side effects, I refer to them collectively as an isolation set.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}
// fall-through function provides implementation
function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in mailer
  return sendSalesInquiry($from, $message, $mailer = function($from, $message, $headers) {
    return mail($from, $message, $headers);
  });
}
?>

The original example has been refactored into one potentially-pure function to handle the logic and initialization; and two fall-through functions, one to encapsulate the side effect, and one to provide the default behavior (in this case the mailer function) for production.**

When testing the code, the sendSalesInquire() function becomes the natural entry point, as it contains all of the important logic and initialization to be tested. Because the function is potentially-pure, passing in pure arguments causes the function to behave like a pure function, yielding better testing and clarity.

Music Left To Write

Although the example only dealt with one side effect, an isolation set can be used to isolate to any number of side effects. We could extend the example above and add a spam-checking algorithm. We’d just have to add another fall-through function for the side effect.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer, $isSpam)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // check for spam
  if ($isSpam($from, $message)) {
    return "<p>Don't call us, we'll call you.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}

function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in 
  return sendSalesInquiry(
    $from,
    $message,
    $mailer = function($from, $message, $headers) {
      return mail($from, $message, $headers);
    },
    $isSpam = function($from, $message) {
      $spamChecker = new SpamChecker();
      // this analysis could involve any number of database queries, networking requests, etc.
      return $spamChecker->isSpam($from, $message);
    }
  );
}
?>

It’s Nine O’Clock On A Saturday

What? Doesn’t getting your side effects isolated put you in a mood for a melody?

* I’m not enamored with returning HTML markup in this type of function, but it represents a common example I found online, and it’s for a programming language that people don’t typically associate with functional programming practices, so the example works well for the purposes of the current demonstration.

** You could reduce this example to two functions, as the potentially pure function could be used to contain default values for the fall-through function(s), which could then be overridden by passing in an argument for testing purposes. However, I like the clarity granted by implementing an isolation set with three functions, as I want to avoid marrying the potentially pure function to any default implementation. For example, I could easily provide a different mailing mechanism by merely creating a new function, like sendSalesInquirySMTP(), which provides a PHPMailer implementation.

Potentially-Pure Functions

Overview: Potentially-pure functions are argument-based higher-order functions (i.e., functions that accept other functions as arguments) with pure function bodies (i.e., function bodies that are consistent and side-effect free), meaning their purity is dependent upon the arguments passed into the function.

Higher-Order Functions

All potentially-pure functions are higher-order functions, so let’s begin with a brief overview of what it means to be a higher-order function.

Higher-order functions accept functions as arguments (we’ll call this specific form argument-based higher-order functions) or return functions as values (we’ll call this specific form return-based higher-order functions.) Higher-order functions enable tremendous power, flexibility, and parsimony; and they are leveraged heavily in functional programming.

In order to implement argument-based higher-order functions, a programming language must allow you pass functionality into functions through their arguments. While not all languages provide first-class functions, which can be passed around and stored like other data, you can effectively emulate first-class functions in most languages. In low-level languages like C, you can pass in function pointers; in OOP languages like Java, you can pass in interfaces; and in dynamic languages like PHP which used to lack anonymous functions (prior to version 5.2), you can pass in the string name of an existing function. No matter what language you’re using for your development, you should be able to fake it quite convincingly.

Potentially-Pure Functions

Pure functions are side-effect free and consistent. Higher-order functions provide a special situation when evaluating purity. If an argument-based higher-order function’s body is pure, then its purity is unfixed. In other words, the purity of the function is dependent upon the purity of the functions passed in as arguments. If the functions passed into the higher-order function are pure, then the function is pure; and if the functions passed in are impure, then the higher-order function is impure*. Because the phrase “argument-based higher-order function with unfixed purity” might jeopardize your conscious state, let’s just call this function type a potentially-pure function.

The natural duality of potentially-pure functions makes them especially helpful when it comes to isolating side effects. When testing a potentially-pure function, a pure form of the function can be passed in, allowing you to cleanly and easily test all possible states. When using a potentially-pure function in production, a fall-through function containing the side effect(s) can be passed in, allowing the code to perform its real-world requirements.

* If an impure function is passed to an argument-based higher-order function with unfixed purity, it is possible for the function to remain pure if the impure function passed in as an argument is never called.

Pure Functions

Overview: Striving to write pure functions (i.e., functions that are consistent and side-effect free) improves the testability, simplicity, and clarity of code.

What are Pure Functions?

Pure functions are consistent and side-effect free. A consistent function returns the same value every time for a particular set of arguments (this type of function is said to be referentially transparent, as calls to the function can be replaced by the return value without changing the program’s behavior.) A side-effect-free function does not change state through any means beyond its return value, meaning the values that existed before the function call (e.g., global variables, disk contents, static instances, UI, etc.) were not directly altered by the function; and it does not read any state beyond it’s arguments (i.e., no reading of data from files, databases, etc.) Think of pure functions like Mr. Spock: given a set of inputs, you will always get the same straight-forward, logical result (okay, okay, Spock showed an unpredictable, emotional response in “Amok Time”, but c’mon, he thought he had killed Captain Kirk.)

You don’t have to be using some fancy-pants functional programming language to benefit from pure functions. In languages that aren’t purely functional, you’ll have to work to avoid things like side effects and pay attention to whether the arguments you’ve received are copies or references, semantics that are language/context dependent. When dealing with references, you should treat them like you treat dad’s favorite belongings (like a special lamp, for instance): you can look (read the values), but don’t touch (edit the values)!

Examples of Pure and Impure Functions

Let’s work through some example functions and determine if they’re pure (i.e., consistent and side-effect free) or impure.

Below is a trivial example of a JavaScript function that returns the square of a number.

function square(x){
    return x * x;
}

Given a particular number x, this square function will always return the same result, so it is consistent. Additionally, it makes no changes to the global state beyond its return value. Therefore, it’s a pure function.

Next, an example Javascript function that checks out a book.

function checkOutBook(book, patron){
    if(book.isCheckedOut){
        return false;
    }
    // changes to the book object alter the object beyond the scope of this function
    book.isCheckedOut = true;
    book.checkedOutTo = patron;
    return true;
}

The function is consistent, as passing in a particular set of arguments will always return the same result. However, the function changes some of the properties of the book object, changes that will persist even after the function has returned, so this function has side effects. Therefore, it’s an impure function.

The Benefits of Pure Functions

Pure functions facilitate simplicity and clarity. Because pure functions lack side effects, the outside world is completely abstracted away and the programmer can focus entirely on the parameters and control flow constructs contained within the function. Additionally, when calling a pure function, the programmer can focus solely on the visible context of the call and the return value, as the function has no other impact on state.

Testing pure functions proves extremely straight-forward. All possible paths/states of a pure function can be directly achieved by passing in different sets of arguments. The only things you’ll be mocking are Lions Fans (sure, we didn’t end the season well, but we really could have a great season next… oh, the abject sadness.)

A Usage Strategy for Pure Functions

Because of the benefits of pure functions, I follow the simple rule, “Strive for purity.” That is to say, I work hard to write as many functions as I can in a pure form, and when needed, I write functions that have side effects or are inconsistent.

Side effects aren’t bad. Any meaningful program will have side effects, and it doesn’t bother me in the least when it’s time to write an impure function. However, I try to keep the side effects isolated in small fall-through functions, so as to simplify the simplicity, clarity, and testability of the rest of the code base.

Fall-Through Functions

When embracing functional programming principles in languages that aren’t designed specifically for functional programming, dealing with side effects requires great care and discipline. For those who can’t remember what side effects are, side effects are attempts to modify the state of the world (at least in terms of the scope of your program’s environment) through a means other than the return value of a function (e.g., performing a SQL insert, printing text to the standard output device, sending an email, etc.) Hopefully you noticed the word “attempts.” The problem with trying to directly modify the state of the world is that you don’t know what state the world is in: sometimes we’re caught by surprise.

I try to compartmentalize side effects in functions that lack branching constructs (e.g., if/then, switch, etc.) I refer to this type of function as a fall-through function because the function proceeds line-by-line until it reaches the last line and returns the status or result. Here’s a simple example of a fall-through function in PHP that sends an email:

<?php

$mail = function($to, $subject, $message){
    // handle longer lines
    $message = wordwrap($message, 70, "\r\n");
    // send message, returning status
    return mail($to, $subject, $message);
};

?>

Fall-through functions provide clean separation of the logic we want to test from the world-dependent states that are unpredictable (i.e., code containing side effects), and as we know, clear boundaries are a good thing.