Isolating Side Effects Using Isolation Sets

A program or function is said to have side effects if it impacts the system state through a means other than its return value or reads the system state through a means other than its arguments. Every meaningful program eventually requires some form of side effect(s),  such as writing output to the standard output file-stream or saving a record to a database. That said, working with pure functions, which lack side effects and are consistent, has many advantages. How can the practical necessity of side effects be amended with the benefits of avoiding them?

Your Special Island

If a program’s side effects are isolated in a small, known subset of the codebase, we can reap the benefits of working in their absence throughout large sections of the codebase whilst providing their practical application when needed. Indeed, functional programming languages like Haskell facilitate this approach by isolating side effects directly through language features / limitations. But what about the many languages that don’t directly facilitate side effect isolation, how can we achieve the same effects?

We Will All Go Down Together

Let’s begin with a typical example involving a non-isolated side effect. We’ll work through a small PHP function for sending email that resembles countless other examples online.* Because the side effect (the call to the mail function) is not isolated, the entire function is impure, making it all very difficult to test.


<?php
function sendSalesInquiry($from, $message)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if (mail($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>"; 
  }
}
?>

And They Parted The Closest Of Friends

To isolate the side effect, we’ll add some all-powerful indirection by refactoring the email function into multiple functions. Using a combination of a potentially-pure function with two fall-through functions allows us to easily, cleanly isolate the side effect in this example. When using this combination of function types specifically to isolate side effects, I refer to them collectively as an isolation set.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}
// fall-through function provides implementation
function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in mailer
  return sendSalesInquiry($from, $message, $mailer = function($from, $message, $headers) {
    return mail($from, $message, $headers);
  });
}
?>

The original example has been refactored into one potentially-pure function to handle the logic and initialization; and two fall-through functions, one to encapsulate the side effect, and one to provide the default behavior (in this case the mailer function) for production.**

When testing the code, the sendSalesInquire() function becomes the natural entry point, as it contains all of the important logic and initialization to be tested. Because the function is potentially-pure, passing in pure arguments causes the function to behave like a pure function, yielding better testing and clarity.

Music Left To Write

Although the example only dealt with one side effect, an isolation set can be used to isolate to any number of side effects. We could extend the example above and add a spam-checking algorithm. We’d just have to add another fall-through function for the side effect.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer, $isSpam)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // check for spam
  if ($isSpam($from, $message)) {
    return "<p>Don't call us, we'll call you.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}

function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in 
  return sendSalesInquiry(
    $from,
    $message,
    $mailer = function($from, $message, $headers) {
      return mail($from, $message, $headers);
    },
    $isSpam = function($from, $message) {
      $spamChecker = new SpamChecker();
      // this analysis could involve any number of database queries, networking requests, etc.
      return $spamChecker->isSpam($from, $message);
    }
  );
}
?>

It’s Nine O’Clock On A Saturday

What? Doesn’t getting your side effects isolated put you in a mood for a melody?

* I’m not enamored with returning HTML markup in this type of function, but it represents a common example I found online, and it’s for a programming language that people don’t typically associate with functional programming practices, so the example works well for the purposes of the current demonstration.

** You could reduce this example to two functions, as the potentially pure function could be used to contain default values for the fall-through function(s), which could then be overridden by passing in an argument for testing purposes. However, I like the clarity granted by implementing an isolation set with three functions, as I want to avoid marrying the potentially pure function to any default implementation. For example, I could easily provide a different mailing mechanism by merely creating a new function, like sendSalesInquirySMTP(), which provides a PHPMailer implementation.

Making Concurrent cURL Requests Using PHP’s curl_multi* Functions

The cURL library proves a valuable resource for developers needing to make use of common URL-based protocols (e.g., HTTP, FTP, etc.) for exchanging data. PHP provides a set of curl* wrapper functions in an extension that nicely integrates cURL’s functionality.

When you have to make multiple requests in a script, it’s often more efficient to utilize the curl_multi* functions (e.g., curl_multi_init), which make it possible to process requests concurrently. For example, if you have to make 2 web requests in a script and each one requires 2 seconds to complete, making 2 separate curl requests, one right after the other, requires 4 seconds. However, if you make use of the curl_multi* functions, the requests will be made concurrently (i.e., we no longer have to wait for one request to finish to start the next one), and requires only 2 seconds (the actual execution time depends on if the scripts are truly running in parallel or merely concurrently.)

Let’s take a look at a function that provides a simple interface to the concurrent capabilities of cURL and is extensible to most situations, as the curl_multi* functions can be cumbersome.

/**
* Simple wrapper function for concurrent request processing with PHP's cURL functions (i.e., using curl_multi* functions.)
*
* @param array $requests Array containing request url, post_data, and settings.
* @param array $opts Optional array containing general options for all requests.
* @return array Array containing keys from requests array and values of arrays each containing data (response, null if response empty or error), info (curl info, null if error), and error (error string if there was an error, otherwise null).
*/
function multi(array $requests, array $opts = [])
{
    // create array for curl handles
    $chs = [];
    // merge general curl options args with defaults
    $opts += [CURLOPT_CONNECTTIMEOUT => 3, CURLOPT_TIMEOUT => 3, CURLOPT_RETURNTRANSFER => 1];
    // create array for responses
    $responses = [];
    // init curl multi handle
    $mh = curl_multi_init();
    // create running flag
    $running = null;
    // cycle through requests and set up
    foreach ($requests as $key => $request) {
        // init individual curl handle
        $chs[$key] = curl_init();
        // set url
        curl_setopt($chs[$key], CURLOPT_URL, $request['url']);
        // check for post data and handle if present
        if ($request['post_data']) {
            curl_setopt($chs[$key], CURLOPT_POST, 1);
            curl_setopt($chs[$key], CURLOPT_POSTFIELDS, $request['post_array']);
        }
        // set opts 
        curl_setopt_array($chs[$key], (isset($request['opts']) ? $request['opts'] + $opts : $opts));
        curl_multi_add_handle($mh, $chs[$key]);
    }
    do {
        // execute curl requests
        curl_multi_exec($mh, $running);
        // block to avoid needless cycling until change in status
        curl_multi_select($mh);
    // check flag to see if we're done
    } while($running > 0);
    // cycle through requests
    foreach ($chs as $key => $ch) {
        // handle error
        if (curl_errno($ch)) {
            $responses[$key] = ['data' => null, 'info' => null, 'error' => curl_error($ch)];
        } else {
            // save successful response
            $responses[$key] = ['data' => curl_multi_getcontent($ch), 'info' => curl_getinfo($ch), 'error' => null];
        }
        // close individual handle
        curl_multi_remove_handle($mh, $ch);
    }
    // close multi handle
    curl_multi_close($mh);
    // return respones
    return $responses;
}

To use this function, you can call it like so:

$responses = multi([
    'google' => ['url' => 'http://google.com', 'opts' => [CURLOPT_TIMEOUT => 2]],
    'msu' => ['url'=> 'http://msu.edu']
]);

And, then you can cycle through the responses:

foreach ($responses as $response) {
    if ($response['error']) {
        // handle error
        continue;
    }
    // check for empty response
    if ($response['data'] === null) {
        // examine $response['info']
        continue;
    }
    // handle data
    $data = $response['data'];
    // do something extraordinary
}

While the above function is helpful for a few requests, if you need to make a large number of requests (perhaps more than 5), then instead you should have a look at the rolling curl library, which makes better use of resources.

And your significant other said you couldn’t multitask 🙂

Fall-Through Functions

When embracing functional programming principles in languages that aren’t designed specifically for functional programming, dealing with side effects requires great care and discipline. For those who can’t remember what side effects are, side effects are attempts to modify the state of the world (at least in terms of the scope of your program’s environment) through a means other than the return value of a function (e.g., performing a SQL insert, printing text to the standard output device, sending an email, etc.) Hopefully you noticed the word “attempts.” The problem with trying to directly modify the state of the world is that you don’t know what state the world is in: sometimes we’re caught by surprise.

I try to compartmentalize side effects in functions that lack branching constructs (e.g., if/then, switch, etc.) I refer to this type of function as a fall-through function because the function proceeds line-by-line until it reaches the last line and returns the status or result. Here’s a simple example of a fall-through function in PHP that sends an email:

<?php

$mail = function($to, $subject, $message){
    // handle longer lines
    $message = wordwrap($message, 70, "\r\n");
    // send message, returning status
    return mail($to, $subject, $message);
};

?>

Fall-through functions provide clean separation of the logic we want to test from the world-dependent states that are unpredictable (i.e., code containing side effects), and as we know, clear boundaries are a good thing.

Long Live the GOTO Statement

Introduction: Infamous GOTO

Sure, since Dijkstra’s letter outlining the harmful aspects of the goto statement, few have voiced even modest amounts of tolerance for the statement, let alone condoned it’s use. Even those who’ve described practical uses of the goto statement have questioned its existence in higher level languages (e.g., although Donald Knuth noted some utility for goto, he also suggested that he would likely never use it in a language that had sufficiently capable iteration and event constructs.)

Today you can find a myriad of online resources that set the goto statement ablaze. The 5.3 release of PHP provided a unique look at the perception of the goto statement, as prior to that release, PHP lacked Continue reading Long Live the GOTO Statement

XSS Prevention in Four Simple Steps

Preventing Cross Site Scripting (XSS) attacks is a daunting task for developers. In short, XSS attacks are an injection attack in which data that is structurally significant in the current context changes the intended semantics and/or functionality. While there are great resources online that walk you through prevention techniques (one of the best security resources is The Open Web Application Security Project, or OWASP, website), it’s easy to get confused when you try to implement all of the necessary safeguards.

Below, I’ve outlined four simple steps that significantly lower the risk of XSS attacks against your website. By being a bit more restrictive, we can simplify our approach to preventing XSS Continue reading XSS Prevention in Four Simple Steps