Passpics: A Picture Is Worth a Thousand Passwords

Using passwords as the primary authentication mechanism in the digital world is painful. Although password cracking techniques continue to get faster and we are called on to make our passwords longer and stronger, our brains (and fingers) still face the same humble limitations they did when we struggled to remember our first email account login.  If we continue along the current trend towards longer passwords, it feels like we’ll eventually be asked to type in 2048-bit keys. I’m sorry, but for a guy like me who can’t even remember to buy the milk, that’s just not going to happen.

Password managers can address some of problems associated with passwords, but they don’t provide a long-term solution.  Password managers are a big target, and even really good ones can be attacked in a manner that puts you at risk. I don’t like the idea that the failure of one piece of software jeopardizes all of my digital security, and I don’t want to always have to install another piece of software just to access my accounts. Additionally, at the core, they’re still using passwords to communicate shared secrets, and even properly procured passwords will pose little issue to the coming legion of super machines.

What we need is something that we can start using now that provides better usability and stronger security. Thankfully, we already make use of a technology that can fulfill our authentication needs: images, which in this context we’ll refer to as passpics.

Passpics Offer Improved Usability

Let’s start with the task of taking and uploading images. Whether you’re on Facebook, Instagram, Twitter, or the Browncoats forums (hey, it’s the best sci-fi show ever,) you’re inevitably going to see images that users of all different backgrounds and skill levels have taken and uploaded without any fanfare. Today’s hardware and software applications make taking and uploading images so easy that even a monkey can do it (even if it doesn’t possess the copyright.)

The findability of passpics also seems quite reasonable, too. Users have become accustomed to organizing their images by directory and/or tags in various software applications. Additionally, many operating systems allow you to visually scan directory contents by providing thumbnails of the files, including images. Of note, I’m not advocating naming or tagging an image with a label like “passpic important”, but I am saying if you know one of your passpics contains a horse, you may browse for it in the “farm” directory or tag it “Mr. Ed.”

Images are also very memorable, making it likely that users will be able to successfully recall sets of passpics. While passwords typically require free recall, images can benefit from recognition-based recall, which typically leads to better performance.  Images provide other memory advantages, too. The method of loci, which uses visual imagery to enhance recall and has been used for centuries, reveals the profound improvements visual imagery can have on memory tasks.

Passpics may also offer advantages in terms of entry accuracy. Because of the funky characters and uncommon key combinations, passwords can be typed incorrectly, especially as they grow longer and our keyboards grow smaller. In contrast, passpics require the selection of one or more image files through the file viewing interface. While the manual entry time may be longer for passpics, the entry accuracy seems likely to be at-or-better-than passwords. This accuracy may allow authentication providers to more quickly lockout nefarious login attempts. User research in this area will be interesting.

Passpics Provide More Protection Against Brute-Force Attacks

When you take that all-important selfie, your smart phone processes a tremendous amount of visual information to create the image file. In fact, even after lossless compression is performed (i.e., the size of the image is reduced but the quality remains unchanged), images remain large files because of the inherent entropy (i.e., information that can’t be predicted by using the other information in the image) contained in high-resolution images.

Practically speaking, no two images captured by a camera will be exactly the same (even if you try really hard to capture the exact same scene.) That’s because the vast amount of information recorded in an image is subject to tolerances in camera sensors, variations in lighting, and changes in the precise placement of the camera. Essentially, anyone with a digital camera can, with but a click of a button, create a wholly unique authentication token that is far more difficult to guess or brute force than any password, passphrase, or cryptographic key currently used for online encryption.

Passpic Theft Concerns

Those with security backgrounds may be concerned about a form of one-factor authentication that makes use of “something you have.” That is to say, when using an authentication token, because of the threat of theft, a two-factor scheme is often implemented (e.g., an ATM requires a PIN in addition to the possession of a bank card to login.) How should we handle these concerns?

Practically speaking, passpics are often at least as secure as passwords in terms of theft. If the transmission or online storage is the weak point (e.g., lack of SSL, weak password hashing, etc.), passwords are just as vulnerable as passpics. However, if one is focused on the specific concern of a user’s passpic being stolen because it’s something they have, not something they know, practically speaking, this isn’t a compelling argument. Users often alter passwords into something they have, as they tend to write them down somewhere (and prominent members in the security community have advocated this approach.) So, passpics are often no worse than passwords in terms of theft risk.

That said, we can significantly mitigate the risk of theft of passpics by adding a “something you know” authentication factor. Encrypting a hard drive is quite easy on many operating systems, which, in the case of physical theft of the hardware, renders the authentication token(s) unusable unless the operating system’s encryption is successfully attacked or the attacker gains access to the account password. And, yes, it’s ironic I’m touting a security mechanism based on passwords as a fix, but you have to pick your battles.

Even if an attacker does gain access to a computer with stored passpics, they are not assured of successfully attacking the login system through theft. Most people today have hundreds if not thousands of images on their computer. Unless the user names the image “my-bank-passpic.jpg”, an attacker would have to try many different images to find the correct login. Still, this isn’t a terribly difficult task for attackers. What else could we do to mitigate theft risks?

There is one crucial implementation detail that I’ve hinted at, but haven’t stated explicitly until now: users must be able to submit a set of passpics to login. This one feature provides a significant security boost across the board in terms of mitigating attacks, but especially in terms of theft. For example, if a user submits a set of three images to login at their bank, and even if they only have 1000 images total on their computer, the number of possible passpic login sets would be more than 160 million. More importantly, users could conceivably diversify the storage locations of passpics within a set (e.g., store one passpic in the cloud, one passpic on their hard drive, and one passpic on a microSD card), making the theft/compromise of one device/service insufficient for a successful attack.

Potential Problems With Passpics

Passwords are often masked to prevent shoulder surfing from compromising your password. For those outside the security community, shoulder surfing is when an attacker views your screen while you interact with your computer/device, providing the opportunity to steal information they can see. In the current implementation of file uploads in most browsers, an attacker who can view your screen while you login could at least identify the images used to login.

Is password masking important? Some have argued that password masking is unneeded, but the subsequent fury of the masses revealed that most security-conscious users deem this form of security a necessity. If an attacker does successfully shoulder-surf your account credentials, they can login to your account. However, unlike passwords, it’s not enough to see the passpics used. The attacker would have to gain access to the actual image files to successfully login to your account. Frankly, I’m unsure how big an issue this is and/or how best to approach it, but I do believe it’s an important consideration moving forward.

Images are relatively large files, so having to upload one or more passpics every time you authenticate does present some concern in terms of response time and network use. That said, most login systems build in a certain amount of cost for password hashing, so login systems already involve an increased response time. In terms of network resources, users have the ability to resample images to smaller sizes that better match their particular network capabilities. Even a grayscale 300 x 300 JPEG image requires a much larger search space for brute-force attacks than the best passwords. Granted, this presents a significant usability issue, and something that may have to be better addressed before passpic can be used by the masses.

One other problem with passpics is the potential for Denial-of-Service (DOS) attacks. Online services would have to allow relatively large file uploads if all of the processing was handled server side. Preventing attackers from leveraging this type of permission would present some challenges. Services could push the processing of the images client-side, but this presents its own challenges. Again, I’m not advocating a particular solution, as I merely want to present this potential issue as something that merits careful consideration.

What about social media, do passpics uploaded to public sites weaken the security? Possibly. If you have a Facebook account and you upload one picture, and that one picture is your passpic to your bank, yes, you are screwed. Just as people can choose poor passwords (you have one daughter Shelly, also noted on Facebook, and “Shelly” is your bank password), they can choose poor passpics.

That said, the public availability of an image does not necessarily make it inappropriate for use in a passpic set. For example, one could securely use an image from Facebook, an image from Dropbox, and an image not uploaded anywhere else from their local computer to form a secure passpic set. As long as one passpic is “private” (i.e., only available to you) in the set, the security remains very strong.

Conclusion

We already use images all the time in our digital lives. Because of their inherent advantages in terms of usability and security, it makes sense to leverage them in the form of passpics for authentication.

I Wish We Used Domain Specific Languages (DSLs) More

First, let me define a Domain Specific Language (DSL). A DSL as a small programming language specifically designed to communicate solutions for a particular domain of problems.

To expand on this definition, while the discussion of DSLs can include  internal DSLs, which are implemented directly within a General Programming Language (GPL) using the available language constructs, I’ll primarily be focusing on external DSLs, which require a custom parser and provide the most flexibility in terms of design. That said, many of the points below apply to internal DSLs, too.

Now that I’ve described what a DSL is, let me tell you why I wish we used DSLs more.

DSLs Are Small

Today we have access to many wonderful GPLs. There are so many, in fact, that it’s sometimes hard to get teams to come together and choose the “right” language for a particular job.

Perhaps due to this perceived competition amongst the languages, it sometimes feels like we’re in the middle of a language construct race.

Programmer 1: “Language X can do this in a few keystrokes with one operator.”

Programmer 2: “Whatever, Language Y has the deathstar operator, which blows Language X out of the sky.”

Programmer 3: “Language Y? That’s so YYYY – 1. We should use Language Z. With it’s first major release (9 minor versions from now some unknown time in the future), it will provide all of the features of Language X AND Language Y.”

As GPLs add language constructs to “simplify” the work of programmers, they require more and more language-level expertise. Indeed, syntactically complex GPLs limit the number of individuals who can properly communicate solutions.

In contrast, well-designed DSLs increase the population of individuals who can properly communicate solutions.  This inclusion of potential problem-solvers is one of their biggest strengths.

DSLs Are Specifically Designed to Communicate Solutions

Davin Granroth has told me of a professor during his college days who used to say, “Less isn’t more, just enough is more.” I see DSLs as tools that facilitate achieving this ideal through intentionally crafted parsimony.

In a well-crafted DSL, syntax is no longer a vestige of the GPL. Rather, every component of the grammar reflects careful choices that best facilitate the communication of solutions for its specific domain of problems, meaning the relevant information for a particular solution is brought to the surface. These qualities do lead to code that is easier to write. More importantly, these qualities also lead to code that is easier read, whether this happens hours, weeks, or years after the initial commit.

Additionally, because DSLs are focused on communicating solutions, they provide a great amount of flexibility when it comes to the actual implementation. Did the APIs change as part of the most recent service upgrade? No problem, the solutions communicated in the DSL don’t have to change. Do your programmers want to switch to language NextCoolRage? No problem, the solutions communicated in the DSL don’t have to change.

DSLs Adapt to Evolving Problem Spaces

“Work on this one specific problem, and I won’t change the constraints at all in the future…”, said no Project Manager (PM) ever. The solutions we communicate today may not adequately address problems we face tomorrow. Any tools we use to communicate solutions must provide the ability to easily accommodate change.

Because of their small size and specific focus on a particular domain of problems, DSLs can be created/adapted relatively quickly. Small languages can be implemented using relatively simple parsers, and making updates to the language is usually a straight-forward task. Additionally, because the problem space is so focused, the design and testing of changes is easier when compared to augmenting GPLs.

When your PM speaks of unanticipated changes that have to be addressed in the next sprint, you can nod your head and smile, retreat to your whiteboard, and start adapting/creating the DSLs that will allow your domain experts to properly communicate solutions.

Words [Not] to Capitalize in AP-Styled Titles: Don’t Mess With a Mouse

To capitalize or not to capitalize: that is the question for any writing that involves titles or headings. There are several different styles to consider, such as AP, MLA, sentence-styled, etc. And, there are many great resources that detail the pros and cons of the various capitalization approaches, such as Grammar Girl’s article on capitalizing titles. I’m not here to continue this analysis, as I’ve made my choice: AP-styled capitalization is my strong preference. No, I’m here to discuss how to remember the rules for this particular style choice.

The rules aren’t that hard to apply, and essentially boil down to the following:

  1. Capitalize any word that starts a title.
  2. Capitalize any word that ends a title.
  3. Capitalize any other word in the title as long as it’s not in this list:
    • a
    • an
    • and
    • at
    • but
    • by
    • for
    • in
    • nor
    • of
    • on
    • or
    • so
    • the
    • to
    • up
    • yet

Rules 1 and 2 are quite easy to remember. However, rule 3 is difficult because it’s easy to forget the list of exception words.

One can try to remember that the list includes all articles (a, an, the), conjunctions (and, but, for, nor, or, yet), and prepositions three or less characters in length (at, by, for, in, of, off, on, to, up.) Honestly, working through the sets and elimination rules is cumbersome, especially if you haven’t had to apply them in a while.

So, I wrote a [clumsy] limerick to serve as a mnemonic device to help remember which words are NOT capitalized (exception words in the middle of a title) when writing AP-styled titles or headings. In this limerick, all words with three characters or less are the exception words. Some of the rhythms have to be finessed, but hey, there are 17 exception words to fit in, and we can’t all be Edward Lear.

Don’t Mess With a Mouse

So a mouse near an eagle and a snake,
Keeps in watch on the shore of a lake.
Yet at night, up to pounce,
But don’t judge by might or ounce,
For neither bird nor serpent would awake.

I’ve posted it here as a reference for me and my school-aged daughters (and, more recently, my daughters produced a brief video to tell the tale.) That doesn’t mean they like it (my eldest daughter hates the fact that the bird dies, although she’s OK with the snake perishing,) but, pragmatically speaking in terms of helping us remember AP-styled title capitalization, it works for us :)

Automatically Generating Content Inventories (Part 1)

Introduction

I’ll admit it, in my youth (say, a few days ago) I’d often generate a content inventory by hand. I’d simply open a new spreadsheet and start working my through the site until I was done chronicling the content. I chose this path because of its simplicity and because many of the websites I work on are quite small.

This month I’m working with a client on several sites, and the total number of pages is close to one thousand. Sure, I’ll likely still want to view each of the pages just in case the title and description fail to reflect the content (or it’s an asset that lacks this meta information), but automatically generating the url, file type, title and description should save a tremendous amount of time.

To automatically generate a content inventory, we’ll break the work up into three steps:

  1. Create a local copy of the website (covered in this post.)
  2. Create a list of broken links (covered in this post.)
  3. Parse the local files to create a spreadsheet (covered in the next post.)

Using Wget To Create A Local Copy Of Your Website

The GNU wget package makes it very easy to generate a local copy of a website. You can use it to crawl your entire website and download all of the linked assets (html files, images, pdf’s, etc.) While you can install wget on Windows and Macs, when I’m using one of these systems I just run a VM of my favorite Linux distro, which already has wget installed. I found a great tutorial that demonstrates how to create a mirror of a website with wget, and it’s most basic usage is illustrated by the command below.


$ wget -m http://www.site.com/

There are many more options, but the command above would create the directory “www.site.com” and put all of the linked files from your website in that directory.

Using Wget To Find Broken Links (404)

Next, let’s make sure we have a list of the broken links in the website. After all, a content inventory is supposed to guide future work, and all future work should take into account content that’s either missing or unfindable.

Again, making use of wget greatly simplifies this task, and I found another great tutorial that outlines using wget to find broken links. The basic command structure is listed below.


$ wget --spider -o file.log -r -p http://www.site.com

Once completed, you have a file that you can grep / search for occurrences of 404 errors.

A Bash Script To Automate Simplify Things

Of course, I’m old and I forget things easily. I can’t be expected to remember these commands for the next five minutes, let alone the next time I’m creating a content inventory a month from now. Additionally, instead of using multiple calls to wget, we can merge these operations into one roundtrip. Here’s a simple bash script that automates the creation of the local mirror of the website and the log file with broken link information.


#!/bin/bash

# remember to run chmod +x myFileNameWhateverItIs

# store domain
echo "Enter website domain (e.g., www.site.com):"
read domain
# store url
url="http://$domain"
# system status
echo "Creating mirror..."
# create local mirror
wget -m -w 2 -o wget.log -p $url
# system status
echo "Creating broken link log..."
# store broken link(s) info
grep -n -B 2 '404 Not Found' wget.log > wget-404.log
# system status
echo "Process completed."

If I store the code above in the file “local-site.sh” (and call chmod +x on it), I can call it directly to create a local copy of the website and a log file containing broken links:


$ ./local-site.sh
> Enter website domain (e.g., www.site.com):
> www.example.com
> Creating mirror...
> Creating broken link log...
> Process completed.

I’ll cover parsing of the local files to create a content inventory spreadsheet in the next post.

Isolating Side Effects Using Isolation Sets

A program or function is said to have side effects if it impacts the system state through a means other than its return value or reads the system state through a means other than its arguments. Every meaningful program eventually requires some form of side effect(s),  such as writing output to the standard output file-stream or saving a record to a database. That said, working with pure functions, which lack side effects and are consistent, has many advantages. How can the practical necessity of side effects be amended with the benefits of avoiding them?

Your Special Island

If a program’s side effects are isolated in a small, known subset of the codebase, we can reap the benefits of working in their absence throughout large sections of the codebase whilst providing their practical application when needed. Indeed, functional programming languages like Haskell facilitate this approach by isolating side effects directly through language features / limitations. But what about the many languages that don’t directly facilitate side effect isolation, how can we achieve the same effects?

We Will All Go Down Together

Let’s begin with a typical example involving a non-isolated side effect. We’ll work through a small PHP function for sending email that resembles countless other examples online.* Because the side effect (the call to the mail function) is not isolated, the entire function is impure, making it all very difficult to test.


<?php
function sendSalesInquiry($from, $message)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if (mail($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>"; 
  }
}
?>

And They Parted The Closest Of Friends

To isolate the side effect, we’ll add some all-powerful indirection by refactoring the email function into multiple functions. Using a combination of a potentially-pure function with two fall-through functions allows us to easily, cleanly isolate the side effect in this example. When using this combination of function types specifically to isolate side effects, I refer to them collectively as an isolation set.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}
// fall-through function provides implementation
function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in mailer
  return sendSalesInquiry($from, $message, $mailer = function($from, $message, $headers) {
    return mail($from, $message, $headers);
  });
}
?>

The original example has been refactored into one potentially-pure function to handle the logic and initialization; and two fall-through functions, one to encapsulate the side effect, and one to provide the default behavior (in this case the mailer function) for production.**

When testing the code, the sendSalesInquire() function becomes the natural entry point, as it contains all of the important logic and initialization to be tested. Because the function is potentially-pure, passing in pure arguments causes the function to behave like a pure function, yielding better testing and clarity.

Music Left To Write

Although the example only dealt with one side effect, an isolation set can be used to isolate to any number of side effects. We could extend the example above and add a spam-checking algorithm. We’d just have to add another fall-through function for the side effect.

<?php
// potentially-pure function
function sendSalesInquiry($from, $message, $mailer, $isSpam)
{
  // validate email
  if (filter_var($from, FILTER_VALIDATE_EMAIL)) {
    return "<p>Email address invalid.</p>";
  }
  // check for spam
  if ($isSpam($from, $message)) {
    return "<p>Don't call us, we'll call you.</p>";
  }
  // init vars
  $to = "sales@company.com";
  $subject = "Sales Inquiry";
  $headers = "From: $from';
  // attempt to send
  if ($mailer($to, $subject, $message, $headers)) {
    return "<p>Email successfully sent.</p>";
  } else {
    return "<p>Email delivery failed.</p>";
  }
}

function sendSalesInquiryMail($from, $message)
{
  // call potentially-pure function passing in 
  return sendSalesInquiry(
    $from,
    $message,
    $mailer = function($from, $message, $headers) {
      return mail($from, $message, $headers);
    },
    $isSpam = function($from, $message) {
      $spamChecker = new SpamChecker();
      // this analysis could involve any number of database queries, networking requests, etc.
      return $spamChecker->isSpam($from, $message);
    }
  );
}
?>

It’s Nine O’Clock On A Saturday

What? Doesn’t getting your side effects isolated put you in a mood for a melody?

* I’m not enamored with returning HTML markup in this type of function, but it represents a common example I found online, and it’s for a programming language that people don’t typically associate with functional programming practices, so the example works well for the purposes of the current demonstration.

** You could reduce this example to two functions, as the potentially pure function could be used to contain default values for the fall-through function(s), which could then be overridden by passing in an argument for testing purposes. However, I like the clarity granted by implementing an isolation set with three functions, as I want to avoid marrying the potentially pure function to any default implementation. For example, I could easily provide a different mailing mechanism by merely creating a new function, like sendSalesInquirySMTP(), which provides a PHPMailer implementation.