Finding Big M: Iteratively Estimating The Mean

Several months ago I wanted to estimate the mean of a value for users of an online app as quickly as possible with as few samples as possible. Every data point required scraping a webpage, and this proved timely AND costly in terms of system resources. Additionally, if I triggered too much traffic with my queries, the host would temporarily block the IP of my server.

Having theorized that the distribution was normal, I considered several more formal approaches (e.g., estimate power and then choose the appropriate sample size; sequential sampling; etc.) However, I was curious if I could develop an iterative approach that would both satisfy my precision requirements and help me get the data I needed as efficiently as possible given the cost of each web-scraping request.

Big M: A Generally Precise Estimate Of The Mean

While I didn’t need to publish the means I was estimating in scholastic journals, I wanted to ensure that the estimates were provably reliable in the general sense. That is to say, I wanted confidence intervals “in the nineties” or “in the eighties.” I wasn’t trying to disprove any formal null hypotheses, but I did want good data. I was training machine learning models for a class, and I wanted the most predictive models possible given my resource limitations.

I decided to play around with an iterative approach to generating a sample of large enough size to achieve the general degree of precision that I desired (even the thought of this would probably make my grad school stats professors throw up in their mouths.) This type of approach is a “no no” in statistics books, as you can generally grow a sample until you temporarily get what you want. You usually want to make informed a priori decisions about your research and then follow them to find your robust results.

However, I’d played around with Monte Carlo simulations a couple decades ago (I’m so old), and I always found it interesting how well various methods generally held up even in the face of violations of assumptions. Additionally, the ability of machine learning models to consistently converge on valid findings even in the face of crude hyperparamters has taught me to put things to the test before discounting them.

I set out to make a (relatively) simple algorithm for estimating the mean of the population to a general (e.g., “around ninety out of one hundred samples will contain the mean of the population with the given error constraint.”) I called this estimate Big M because, well, you know, D. Knuth is awesome, and Big O conveys a form of general precision that I wanted to embrace. If I’m working with big data, I don’t need a scientifically chosen set of samples that should guarantee a CI of 95%, I just need to know that I’m generally in the 90s.

The Big M Algorithm Overview

After trying out various forms of the algorithm, I developed the following approach, and a quick example Jupyter notebook is linked to containing code and several random results. Essentially, the code integrates the following algorithm.

  1. Select initial sample.
  2. Compute the confidence interval (CI) and mean.
  3. Check if the acceptable error (e.g., off by no more than 2 inches) is outside the confidence interval.
    1. If the acceptable error is outside the CI, increment the count of valid findings.
    2. If the acceptable error is inside the CI, start the count over at zero.
  4. Check if the continuous count has reached its threshold.
    1. If the continuous count has reached its threshold, exit the function with the mean estimate.
    2. If the continuous count has not reached its threshold, add one more observation to the sample and return to step 2.

There are some other details of note. The continuous count formulation is based on the ratio of the population size to the initial sample size and the confidence percentage chosen. Additionally, there is a maximum sample size that is formulated automatically by the size of the population and the initial sample size, though this parameter can be set manually.

The Big M Jupyter Notebook Example Code (Embedded as HTML)

I’ve linked to a demonstration of the code to implement the algorithm. Much of this is arbitrary, and I’m sure you could refactor this so it performs better. I’ve since developed a server-side implementation that’s quite different from this in a language other than Python, but this should (barring bugs, which obviously will be present) capture the general thrust of the algorithm and show general results (you can just rerun the Notebook to see new results.)

Big M Jupyter Notebook at GitHub

YMMV ūüôā

Monitor Network Traffic on an iOS Device Using Remote Virtual Interface

There are several situations in which one may want to monitor network traffic on an iOS device (e.g., ensuring there is no unexpected network traffic, identifying the APIs utilized by various apps, etc.) Let’s look at one possible option to accomplish this. From iOS 5 on, we can use Remote Virtual Interface (RVI) to add a network interface to your macOS device that contains the packet stream from an iOS device.

Install Xcode From The App Store

First, ensure that you’ve installed Xcode from the App Store on the Mac you’ll be using. It’s free, and it’s a straight-forward install.

Screen capture of search for Xcode
Find and install Xcode from the Apple Store.

Install Xcode Command Line Tools

Next, make sure you have the command line tools for Xcode installed on your system. You can type the following command to check if they are installed:

$ xcode-select --version
Screen capture of running command in terminal.
Ensure that the Xcode command line tools are installed.

If you don’t see any version information and you get a “command not found” type of error, you can use the following command to install the tools:

$ xcode-select --install

Of note, don’t try to use the same command above to update your installation of the command tools, just let Apple prompt you for an update (or, if you have automatic updates enabled, updates should happen without you needing to do anything.)

Connect Your iOS Device To Your Mac Computer

Then, connect your iOS device with your Mac computer using whatever wired connection is required (for my iPhone 8 and my iMac, I’m using a USB-to-Lightning cable.) Once connected, you just need to have both devices turned on so they can talk to each other (you may have to enter the passcode for your iOS device to unlock it.)

Start Xcode And Find Your UDID

Next, we have to locate the Unique Device Identifier (UDID) for your iOS device. The easiest way to do this (and have something you can copy into your command for the next step) is to use Xcode. After starting Xcode, you can navigate to the Window menu and then select Devices and Simulators. That will bring up a new window, then you can select the Devices tab, which should reveal detailed information about your iOS device. For our purposes, we need the value after the Identifier label (blurred out in my image below), which is the UDID for the device.

Screen capture of opening the devices tab in Xcode.

Find The “rvictl” Command On Your Mac

Now we need to open the terminal again. First, we have to find where the RVI command is located on your version of macOS. The find command can do this nicely, and we’ll enhance our command so we don’t see hundreds of permission denied messages.

$ find / -name "rvictl" 2>/dev/null

The output should reveal the location of the command. On my iMac running Catalina, the location is /Library/Apple/usr/bin, but make sure you check your system for the precise location.

Next, change to the directory of the rvictl binary and then run the command.

$ cd /Path/On/Your/System
Screen capture of running command in terminal.

Run The “rvictl” Command To Add Your iOS Device As A Network Interface

Finally, we can run the rvictl command and pass in the UDID we found earlier for our iOS device to start up a new network interface that will allow us to monitor the network traffic on the device using our Mac computer.

$ rvictl -s the-udid-number-of-your-ios-device
Screen capture of running command in terminal.

Test The Network Interface With tcpdump

Now that the network interface has been configured on your Mac for your iOS device (usually called rvi0), let’s test it to ensure that it’s working. Try using tcpdump to view HTTP activity on your iOS device and then visit a webpage on your phone that is using HTTP (not HTTPS.)

$ tcpdump -i rvi0 port http
Screen capture of running command in terminal.

Take Aways

You should now have the ability to configure your Mac computer to monitor network traffic on your iOS device. There are pros and cons to this particular approach. On the positive side, it is relatively easy if you’re using a Mac, unencrypted traffic is easily viewed, and the required applications/tools are few. However, if you you don’t own a Mac computer, or if you need to view encrypted traffic (e.g., HTTPS), there are better approaches. I’ll cover other monitoring options in the future that address these issues.

Xubri Educational Resources

Xubri

One of my companies, StartingStrong.com, started a line of educational resources to help students finish good practice fast: Xubri. Now that the Xubri trademark has been registered, I’m going to start creating more educational resources under the Xubri name.

You might wonder why we waited for the trademark to be finalized before investing the time to create more resources. Well, I had a bad experience where I’d worked hard to build up the presence of an app in the Apple App Store. Then, someone created an app with the same name… except that they added “HD.” Seriously, they just called their app “name-of-my-app HD.” The similarly-name “HD” product seriously undermined my brand and advertising. Lesson learned!

So far, the brand includes several basic math facts apps in the Apple App Store and an audio single. However, there are several new apps being actively developed, and we’re really¬†excited for what the future holds!

 

Dusted-Off Version of 1944 “Chart of Electromagnetic Radiations”

Charts are fantastic! And, on the rare occasion that you find an old chart that possesses the charm of a previous era whilst maintaining valuable insights for today’s learners, you’ve found a true treasure.

A few years back, I’d looked for a chart covering electromagnetic radiation for my children, and I’d found some really nice options. One stood out, and while¬†I’m not the only one who liked it, I never pulled the trigger. Again, it’s a very nice poster with nice reviews on Amazon, but it didn’t compel me to spend my money immediately.

Advancing to the current year, I realized my daughters were¬†getting to the age where putting off getting the poster was no longer an option (at least in dad’s eyes,) so I figured I’d take another look to see what I could find. Eventually, I came across an old chart posted to Lawrence Livermore National Laboratory’s (LLNL) flickr account. Quoting the image description:

If you’re into scientific antiques, you have to examine the details in this 1944 poster from the W.M Welch Scientific Company: “Chart of Electromagnetic Radiations.” It was found tucked away in the back of an unused office years ago, but now hangs framed in a high-traffic hallway populated by Lawrence Livermore engineers.

Chart of Electromagnetic Radiations

What a marvelous poster. Certainly, this was the poster for which I’d been waiting. Beyond the beautiful presentation, someone had also worked up a nice writeup of the chart’s provenance, which only added to its¬†allure. Sure, Edward Tufte may not have installed this particular chart in his house, but even he would have to concede the impressive level of information density achieved.

Really, it looked like all systems were go. It’s licensed under Creative Commons 2.0, so¬†getting this beautiful chart printed as a large-size poster would be a snap. All I had to do was go to FedEx Office¬†to get some pricing and then quick talk to my beautiful wife. Easy peasy.

Although FedEx had some reasonable pricing, apparently the discussion with the wife posed¬†a greater¬†stumbling block than I had anticipated. Couldn’t she see the beauty of this poster? Why couldn’t we put this baby up on one of our walls as big as it could be¬†printed?

After much bargaining, she agreed to let me put up a poster if I improved the appearance of the chart (it looked old, worn, and dusty to her), and we limited the largest side to 36 inches.

So, after putting in some time in Photoshop, I have a “dusted-off” version of the chart ready for printing. I tried to limit the edits to repairs related to color fading, and some extreme rips, as I thoroughly appreciate¬†the aged appearance of the chart.

Dusted-off Chart of Electromagnetic Radiations

Following the license of Lawrence Livermore National Laboratory’s original image upload, this updated version is also licensed by the Creative Commons 2.0 license.

Here’s the link to the full-size image as a JPEG (10000 x 6945 pixels, 50.2 MB.)¬†Enjoy!

Finally, if you make a better version of the chart, I’d love it if you linked to it in the comments so I can negotiate for a larger poster ūüėČ

Update January 19th, 2017:

Ryan from the IEEE History Center commented with a link to another nice writeup on this chart in the January 2017 edition of the Journal of the Caxton Club (starts on page 10.)

Passpics: A Picture Is Worth a Thousand Passwords

Using passwords as the primary authentication mechanism in the digital world¬†is painful. Although password cracking techniques continue to get faster¬†and¬†we are called on to make our passwords longer and stronger, our brains (and fingers) still face the same humble limitations they did when we struggled to remember our¬†first email account login.¬†¬†If we continue along the current trend towards longer passwords, it feels like we’ll eventually be asked to type in 2048-bit keys. I’m sorry, but for a guy like me who can’t even remember to buy the milk, that’s just not going to happen.

Password managers can¬†address some of problems associated with passwords, but they don’t provide a long-term solution. ¬†Password managers are a big target, and even really good ones¬†can be attacked in a manner that puts you at risk. I don’t like the idea that the failure of one piece of software jeopardizes all of my digital security, and I don’t want to¬†always have to install another piece of software just to access my accounts. Additionally, at the core, they’re still using passwords to communicate shared secrets, and even properly¬†procured¬†passwords¬†will pose little issue to the coming legion of super machines.

What we need is something that we can start using now¬†that provides¬†better usability and stronger security. Thankfully, we already make use of¬†a technology that can fulfill our authentication needs:¬†images, which in this context we’ll refer to as passpics.

Passpics Offer Improved Usability

Let’s start with the task of taking and uploading images.¬†Whether you’re on Facebook, Instagram, Twitter, or the Browncoats forums (hey, it’s the best sci-fi show ever,) you’re inevitably going to see images that users of all different backgrounds and skill levels have taken and uploaded without any fanfare. Today’s hardware and software applications make taking and uploading images so easy that even a monkey can do it (even if it doesn’t possess the copyright.)

The findability of¬†passpics also seems quite reasonable, too. Users have become accustomed to organizing their images by directory and/or tags in various software applications. Additionally, many operating systems allow you to¬†visually¬†scan directory contents¬†by providing thumbnails of the files, including images. Of note, I’m not advocating naming or tagging an image with a label like “passpic important”, but I am saying if you know one of your passpics contains a horse, you may browse for it in the “farm” directory or¬†tag it “Mr. Ed.”

Images are also very memorable, making it likely that users will be able to successfully recall sets of passpics. While passwords typically require free recall, images can benefit from recognition-based recall, which typically leads to better performance.  Images provide other memory advantages, too. The method of loci, which uses visual imagery to enhance recall and has been used for centuries, reveals the profound improvements visual imagery can have on memory tasks.

Passpics may also offer advantages in terms of entry accuracy. Because of the funky characters and uncommon key combinations, passwords can be typed incorrectly, especially as they grow longer and our keyboards grow smaller. In contrast, passpics require the selection of one or more image files through the file viewing interface. While the manual entry time may be longer for passpics, the entry accuracy seems likely to be at-or-better-than passwords. This accuracy may allow authentication providers to more quickly lockout nefarious login attempts. User research in this area will be interesting.

Passpics Provide More Protection Against Brute-Force Attacks

When you take that all-important selfie, your smart phone processes a tremendous amount of visual information to create the image file. In fact, even after lossless compression¬†is performed (i.e., the size of the image is reduced but the quality remains unchanged), images remain large¬†files because of the inherent entropy (i.e., information that can’t be predicted by using the other information in the image) contained in high-resolution images.

Practically speaking, no two images captured by a camera will be exactly the same (even if you try really hard to capture the exact same scene.)¬†That’s because the vast amount of information recorded in an image is subject¬†to tolerances in camera sensors, variations in lighting, and¬†changes in the precise placement of the camera. Essentially, anyone with a digital camera can, with but a click of a button, create a wholly unique authentication token that is far more difficult to guess or brute force than any password, passphrase, or cryptographic key currently used for online encryption.

Passpic Theft Concerns

Those¬†with security backgrounds may be concerned about a form of one-factor authentication that makes use of “something¬†you have.” That is to say, when using an authentication token, because of the threat of theft, a two-factor scheme is often implemented (e.g., an ATM requires a PIN in addition to the possession of¬†a bank card to login.) How should we handle these concerns?

Practically speaking,¬†passpics are often at least as secure as passwords in terms of theft.¬†If the transmission or online storage¬†is the weak point (e.g., lack of SSL,¬†weak password hashing, etc.), passwords are¬†just as vulnerable as passpics. However, if one is focused on the specific concern of a user’s passpic being stolen because it’s something they have, not something they know, practically speaking, this isn’t a compelling argument. Users¬†often alter passwords into something they have, as they tend to write¬†them down somewhere¬†(and prominent members in the security community have advocated this approach.) So, passpics are often no worse than passwords in terms of theft risk.

That said, we can significantly¬†mitigate the risk of theft of passpics by adding a¬†“something you know” authentication factor. Encrypting a hard drive is quite easy on¬†many operating systems, which, in the case of physical theft of the hardware, renders the authentication token(s) unusable unless the operating system’s encryption is successfully attacked or the attacker gains access to the account password. And, yes, it’s ironic I’m touting a security mechanism based on passwords as a fix, but you have to pick your battles.

Even if an attacker does gain access to a computer with stored¬†passpics, they are not assured of successfully attacking the login system through theft. Most people today have hundreds if not thousands of images on their computer. Unless the user¬†names the image “my-bank-passpic.jpg”, an attacker would have to try many different images to find the correct login. Still, this isn’t a terribly difficult task for attackers. What else could we do to mitigate theft risks?

There is one crucial implementation detail that I’ve hinted at, but haven’t stated explicitly until now: users must be able to submit a set of passpics¬†to login.¬†This one feature provides a significant security boost across the board in terms of mitigating attacks, but especially in terms of¬†theft. For example, if a user¬†submits a set of¬†three¬†images to login¬†at their¬†bank, and even if they only have 1000 images total on their computer, the¬†number of possible passpic login sets would be more than 160 million. More importantly, users could conceivably diversify the storage locations of passpics within a set (e.g., store one passpic in the cloud, one passpic on their hard drive, and one passpic on a microSD card), making the¬†theft/compromise of¬†one¬†device/service insufficient for a successful attack.

Potential Problems With Passpics

Passwords are often masked to prevent shoulder surfing from compromising your password. For those outside the security community, shoulder surfing is when an attacker views your screen while you interact with your computer/device, providing the opportunity to steal information they can see. In the current implementation of file uploads in most browsers, an attacker who can view your screen while you login could at least identify the images used to login.

Is password masking important? Some have argued that password masking is unneeded, but the subsequent fury of the masses¬†revealed¬†that most security-conscious users¬†deem this form of security a necessity. If an attacker does successfully shoulder-surf your account credentials, they can login to your account. However, unlike passwords, it’s not enough to see the¬†passpics used. The attacker would have to gain access to the actual image files¬†to successfully login to your account. Frankly, I’m unsure how big an issue this is and/or how best to approach it, but I do believe it’s an important consideration¬†moving forward.

Images are relatively large files, so having to upload one or more passpics every time you authenticate does present some concern in terms of response time and network use. That said, most login systems build in a certain amount of cost for password hashing, so login systems already involve an increased response time. In terms of network resources, users have the ability to resample images to smaller sizes that better match their particular network capabilities. Even a grayscale 300 x 300 JPEG image requires a much larger search space for brute-force attacks than the best passwords. Granted, this presents a significant usability issue, and something that may have to be better addressed before passpic can be used by the masses.

One other problem with passpics is the potential for Denial-of-Service (DOS) attacks. Online services would have to allow relatively large file uploads if all of the processing was handled server side. Preventing attackers from leveraging this type of permission would present some challenges. Services could push the processing of the images client-side, but this presents its own challenges. Again, I’m not advocating a particular solution, as I merely want to present this potential issue as something that merits careful consideration.

What about social media, do passpics uploaded to public sites¬†weaken the security? Possibly. If you have a Facebook account and you upload one picture, and that one picture is your passpic to your bank, yes, you are screwed. Just as people can choose poor passwords (you have one daughter Shelly, also noted on Facebook, and¬†“Shelly” is¬†your bank password), they can¬†choose poor¬†passpics.

That said, the public availability of an image does not necessarily¬†make it inappropriate for use in a passpic set. For example, one could securely use an image from Facebook, an image from Dropbox, and an image not uploaded anywhere else¬†from their local computer to form a secure passpic set. As long as one passpic is “private”¬†(i.e., only available to you)¬†in the set, the security remains very strong.

Conclusion

We already use images all the time in our digital lives. Because of their inherent advantages in terms of usability and security, it makes sense to leverage them in the form of passpics for authentication.