I Wish We Used Domain Specific Languages (DSLs) More

First, let me define a Domain Specific Language (DSL). A DSL as a small programming language specifically designed to communicate solutions for a particular domain of problems.

To expand on this definition, while the discussion of DSLs can include  internal DSLs, which are implemented directly within a General Programming Language (GPL) using the available language constructs, I’ll primarily be focusing on external DSLs, which require a custom parser and provide the most flexibility in terms of design. That said, many of the points below apply to internal DSLs, too.

Now that I’ve described what a DSL is, let me tell you why I wish we used DSLs more.

DSLs Are Small

Today we have access to many wonderful GPLs. There are so many, in fact, that it’s sometimes hard to get teams to come together and choose the “right” language for a particular job.

Perhaps due to this perceived competition amongst the languages, it sometimes feels like we’re in the middle of a language construct race.

Programmer 1: “Language X can do this in a few keystrokes with one operator.”

Programmer 2: “Whatever, Language Y has the deathstar operator, which blows Language X out of the sky.”

Programmer 3: “Language Y? That’s so YYYY – 1. We should use Language Z. With it’s first major release (9 minor versions from now some unknown time in the future), it will provide all of the features of Language X AND Language Y.”

As GPLs add language constructs to “simplify” the work of programmers, they require more and more language-level expertise. Indeed, syntactically complex GPLs limit the number of individuals who can properly communicate solutions.

In contrast, well-designed DSLs increase the population of individuals who can properly communicate solutions.  This inclusion of potential problem-solvers is one of their biggest strengths.

DSLs Are Specifically Designed to Communicate Solutions

Davin Granroth has told me of a professor during his college days who used to say, “Less isn’t more, just enough is more.” I see DSLs as tools that facilitate achieving this ideal through intentionally crafted parsimony.

In a well-crafted DSL, syntax is no longer a vestige of the GPL. Rather, every component of the grammar reflects careful choices that best facilitate the communication of solutions for its specific domain of problems, meaning the relevant information for a particular solution is brought to the surface. These qualities do lead to code that is easier to write. More importantly, these qualities also lead to code that is easier read, whether this happens hours, weeks, or years after the initial commit.

Additionally, because DSLs are focused on communicating solutions, they provide a great amount of flexibility when it comes to the actual implementation. Did the APIs change as part of the most recent service upgrade? No problem, the solutions communicated in the DSL don’t have to change. Do your programmers want to switch to language NextCoolRage? No problem, the solutions communicated in the DSL don’t have to change.

DSLs Adapt to Evolving Problem Spaces

“Work on this one specific problem, and I won’t change the constraints at all in the future…”, said no Project Manager (PM) ever. The solutions we communicate today may not adequately address problems we face tomorrow. Any tools we use to communicate solutions must provide the ability to easily accommodate change.

Because of their small size and specific focus on a particular domain of problems, DSLs can be created/adapted relatively quickly. Small languages can be implemented using relatively simple parsers, and making updates to the language is usually a straight-forward task. Additionally, because the problem space is so focused, the design and testing of changes is easier when compared to augmenting GPLs.

When your PM speaks of unanticipated changes that have to be addressed in the next sprint, you can nod your head and smile, retreat to your whiteboard, and start adapting/creating the DSLs that will allow your domain experts to properly communicate solutions.

Words [Not] to Capitalize in AP-Styled Titles: Don’t Mess With a Mouse

To capitalize or not to capitalize: that is the question for any writing that involves titles or headings. There are several different styles to consider, such as AP, MLA, sentence-styled, etc. And, there are many great resources that detail the pros and cons of the various capitalization approaches, such as Grammar Girl’s article on capitalizing titles. I’m not here to continue this analysis, as I’ve made my choice: AP-styled capitalization is my strong preference. No, I’m here to discuss how to remember the rules for this particular style choice.

The rules aren’t that hard to apply, and essentially boil down to the following:

  1. Capitalize any word that starts a title.
  2. Capitalize any word that ends a title.
  3. Capitalize any other word in the title as long as it’s not in this list:
    • a
    • an
    • and
    • at
    • but
    • by
    • for
    • in
    • nor
    • of
    • on
    • or
    • so
    • the
    • to
    • up
    • yet

Rules 1 and 2 are quite easy to remember. However, rule 3 is difficult because it’s easy to forget the list of exception words.

One can try to remember that the list includes all articles (a, an, the), conjunctions (and, but, for, nor, or, yet), and prepositions three or less characters in length (at, by, for, in, of, off, on, to, up.) Honestly, working through the sets and elimination rules is cumbersome, especially if you haven’t had to apply them in a while.

So, I wrote a [clumsy] limerick to serve as a mnemonic device to help remember which words are NOT capitalized (exception words in the middle of a title) when writing AP-styled titles or headings. In this limerick, all words with three characters or less are the exception words. Some of the rhythms have to be finessed, but hey, there are 17 exception words to fit in, and we can’t all be Edward Lear.

Don’t Mess With a Mouse

So a mouse near an eagle and a snake,
Keeps in watch on the shore of a lake.
Yet at night, up to pounce,
But don’t judge by might or ounce,
For neither bird nor serpent would awake.

I’ve posted it here as a reference for me and my school-aged daughters (and, more recently, my daughters produced a brief video to tell the tale.) That doesn’t mean they like it (my eldest daughter hates the fact that the bird dies, although she’s OK with the snake perishing,) but, pragmatically speaking in terms of helping us remember AP-styled title capitalization, it works for us :)

Automatically Generating Content Inventories (Part 1)

Introduction

I’ll admit it, in my youth (say, a few days ago) I’d often generate a content inventory by hand. I’d simply open a new spreadsheet and start working my through the site until I was done chronicling the content. I chose this path because of its simplicity and because many of the websites I work on are quite small.

This month I’m working with a client on several sites, and the total number of pages is close to one thousand. Sure, I’ll likely still want to view each of the pages just in case the title and description fail to reflect the content (or it’s an asset that lacks this meta information), but automatically generating the url, file type, title and description should save a tremendous amount of time.

To automatically generate a content inventory, we’ll break the work up into three steps:

  1. Create a local copy of the website (covered in this post.)
  2. Create a list of broken links (covered in this post.)
  3. Parse the local files to create a spreadsheet (covered in the next post.)

Using Wget To Create A Local Copy Of Your Website

The GNU wget package makes it very easy to generate a local copy of a website. You can use it to crawl your entire website and download all of the linked assets (html files, images, pdf’s, etc.) While you can install wget on Windows and Macs, when I’m using one of these systems I just run a VM of my favorite Linux distro, which already has wget installed. I found a great tutorial that demonstrates how to create a mirror of a website with wget, and it’s most basic usage is illustrated by the command below.


$ wget -m http://www.site.com/

There are many more options, but the command above would create the directory “www.site.com” and put all of the linked files from your website in that directory.

Using Wget To Find Broken Links (404)

Next, let’s make sure we have a list of the broken links in the website. After all, a content inventory is supposed to guide future work, and all future work should take into account content that’s either missing or unfindable.

Again, making use of wget greatly simplifies this task, and I found another great tutorial that outlines using wget to find broken links. The basic command structure is listed below.


$ wget --spider -o file.log -r -p http://www.site.com

Once completed, you have a file that you can grep / search for occurrences of 404 errors.

A Bash Script To Automate Simplify Things

Of course, I’m old and I forget things easily. I can’t be expected to remember these commands for the next five minutes, let alone the next time I’m creating a content inventory a month from now. Additionally, instead of using multiple calls to wget, we can merge these operations into one roundtrip. Here’s a simple bash script that automates the creation of the local mirror of the website and the log file with broken link information.


#!/bin/bash

# remember to run chmod +x myFileNameWhateverItIs

# store domain
echo "Enter website domain (e.g., www.site.com):"
read domain
# store url
url="http://$domain"
# system status
echo "Creating mirror..."
# create local mirror
wget -m -w 2 -o wget.log -p $url
# system status
echo "Creating broken link log..."
# store broken link(s) info
grep -n -B 2 '404 Not Found' wget.log > wget-404.log
# system status
echo "Process completed."

If I store the code above in the file “local-site.sh” (and call chmod +x on it), I can call it directly to create a local copy of the website and a log file containing broken links:


$ ./local-site.sh
> Enter website domain (e.g., www.site.com):
> www.example.com
> Creating mirror...
> Creating broken link log...
> Process completed.

I’ll cover parsing of the local files to create a content inventory spreadsheet in the next post.

Isolating Side Effects Using Isolation Sets

A program or function is said to have side effects if it impacts the system state through a means other than its return value or reads the system state through a means other than its arguments. Every meaningful program eventually requires some form of side effect(s),  such as writing output to the standard output file-stream or saving a record to a database. That said, working with pure functions, which lack side effects and are consistent, has many advantages. How can the practical necessity of side effects be amended with the benefits of avoiding them?

Your Special Island

If a program’s side effects are isolated in a small, known subset of the codebase, we can reap the benefits of working in their absence throughout large sections of the codebase whilst providing their practical application when needed. Indeed, functional programming languages like Haskell facilitate this approach by isolating side effects directly through language features / limitations. But what about the many languages that don’t directly facilitate side effect isolation, how can we achieve the same effects?

We Will All Go Down Together

Let’s begin with a typical example involving a non-isolated side effect. We’ll work through a small PHP function for sending email that resembles countless other examples online.* Because the side effect (the call to the mail function) is not isolated, the entire function is impure, making it all very difficult to test.


<?php
function sendSalesInquiry($from, $message)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if (mail($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>"; 
  }
}
?>

And They Parted The Closest Of Friends

To isolate the side effect, we’ll add some all-powerful indirection by refactoring the email function into multiple functions. Using a combination of a potentially-pure function with two fall-through functions allows us to easily, cleanly isolate the side effect in this example. When using this combination of function types specifically to isolate side effects, I refer to them collectively as an isolation set.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}
// fall-through function provides implementation
function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in mailer
  return sendSalesInquiry($from, $message, $mailer = function($from, $message, $headers) {
    return mail($from, $message, $headers);
  });
}
?>

The original example has been refactored into one potentially-pure function to handle the logic and initialization; and two fall-through functions, one to encapsulate the side effect, and one to provide the default behavior (in this case the mailer function) for production.**

When testing the code, the sendSalesInquire() function becomes the natural entry point, as it contains all of the important logic and initialization to be tested. Because the function is potentially-pure, passing in pure arguments causes the function to behave like a pure function, yielding better testing and clarity.

Music Left To Write

Although the example only dealt with one side effect, an isolation set can be used to isolate to any number of side effects. We could extend the example above and add a spam-checking algorithm. We’d just have to add another fall-through function for the side effect.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer, $isSpam)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // check for spam
  if ($isSpam($from, $message)) {
    return "<p>Don't call us, we'll call you.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}

function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in 
  return sendSalesInquiry(
    $from,
    $message,
    $mailer = function($from, $message, $headers) {
      return mail($from, $message, $headers);
    },
    $isSpam = function($from, $message) {
      $spamChecker = new SpamChecker();
      // this analysis could involve any number of database queries, networking requests, etc.
      return $spamChecker->isSpam($from, $message);
    }
  );
}
?>

It’s Nine O’Clock On A Saturday

What? Doesn’t getting your side effects isolated put you in a mood for a melody?

* I’m not enamored with returning HTML markup in this type of function, but it represents a common example I found online, and it’s for a programming language that people don’t typically associate with functional programming practices, so the example works well for the purposes of the current demonstration.

** You could reduce this example to two functions, as the potentially pure function could be used to contain default values for the fall-through function(s), which could then be overridden by passing in an argument for testing purposes. However, I like the clarity granted by implementing an isolation set with three functions, as I want to avoid marrying the potentially pure function to any default implementation. For example, I could easily provide a different mailing mechanism by merely creating a new function, like sendSalesInquirySMTP(), which provides a PHPMailer implementation.

Potentially-Pure Functions

Overview: Potentially-pure functions are argument-based higher-order functions (i.e., functions that accept other functions as arguments) with pure function bodies (i.e., function bodies that are consistent and side-effect free), meaning their purity is dependent upon the arguments passed into the function.

Higher-Order Functions

All potentially-pure functions are higher-order functions, so let’s begin with a brief overview of what it means to be a higher-order function.

Higher-order functions accept functions as arguments (we’ll call this specific form argument-based higher-order functions) or return functions as values (we’ll call this specific form return-based higher-order functions.) Higher-order functions enable tremendous power, flexibility, and parsimony; and they are leveraged heavily in functional programming.

In order to implement argument-based higher-order functions, a programming language must allow you pass functionality into functions through their arguments. While not all languages provide first-class functions, which can be passed around and stored like other data, you can effectively emulate first-class functions in most languages. In low-level languages like C, you can pass in function pointers; in OOP languages like Java, you can pass in interfaces; and in dynamic languages like PHP which used to lack anonymous functions (prior to version 5.2), you can pass in the string name of an existing function. No matter what language you’re using for your development, you should be able to fake it quite convincingly.

Potentially-Pure Functions

Pure functions are side-effect free and consistent. Higher-order functions provide a special situation when evaluating purity. If an argument-based higher-order function’s body is pure, then its purity is unfixed. In other words, the purity of the function is dependent upon the purity of the functions passed in as arguments. If the functions passed into the higher-order function are pure, then the function is pure; and if the functions passed in are impure, then the higher-order function is impure*. Because the phrase “argument-based higher-order function with unfixed purity” might jeopardize your conscious state, let’s just call this function type a potentially-pure function.

The natural duality of potentially-pure functions makes them especially helpful when it comes to isolating side effects. When testing a potentially-pure function, a pure form of the function can be passed in, allowing you to cleanly and easily test all possible states. When using a potentially-pure function in production, a fall-through function containing the side effect(s) can be passed in, allowing the code to perform its real-world requirements.

* If an impure function is passed to an argument-based higher-order function with unfixed purity, it is possible for the function to remain pure if the impure function passed in as an argument is never called.