How do you enter text on your phone? If it has a touch screen then chances are you use a tiny little keyboard on the screen. If not, you probably still know how that might work. You just press the tiny little keys on the little keyboard for each letter that you want to type and they show up on the screen. It’s a pretty intuitive mode of interaction for anybody who has spent some time on a computer before.
There’s a very serious limitation to tapping away at the little keys however: it’s really slow. Despite the backroom deals that have taken place between the fannypack industry and mobile phone manufacturers, you’re still not going to be able to make Mavis Beacon proud by fitting eight fingers on the home row. Most people end up tapping away with their two thumbs or a single finger. Even if you get in a groove it’s just frustratingly slow compared to a real keyboard.
At this point some of you who prefer swyping to typing might be thinking something along the lines of, “tap typing is for suckers!” Now some of you are probably wondering, “what’s swyping?” Well… swyping, a portmanteau of swiping and typing, is an alternative way of interacting with those same little keyboards. Instead of laboriously lifting your finger and plunging it down against rigid glass for every single letter, you just slide your finger from key to key. Let’s take a quick look at how we could type or swype the word “head.”
On the left you see how we would type it. The orange circle indicates a touch event on the device and we just highlight each key to make it more clear what’s being entered. On the right we see how you would swype the word. We added a blue line to show you what the whole swype pattern looks like but the orange circle again represents touch events as time progresses. It’s not really any more complicated than typing. Figuring out which word was intended from a swype pattern can, on the other hand, be a bit tricky… we’ll get to that later.
Swyping might not seem like that big of a deal, but after you get the hang of it it’s amazing. Your finger dances effortlessly across the screen as you enter entire words in single fluid motions. If you ever have to use a touch keyboard that doesn’t support swyping you feel like you’re using two ball-peen hammers on a typewriter. And, most importantly, it’s fast.
Back in 2010, Franklin Page set the (now defeated) world record for fastest text message on a touchscreen device by swyping the phrase, “The razor-toothed piranhas of the genera Serrasalmus and Pygocentrus are the most ferocious freshwater fish in the world. In reality they seldom attack a human,” in just 35.54 seconds. That works out to 43.9 words per minute! Granted, you better have a great smile if you’re trying to get a job as a secretary typing at that speed, but it’s still pretty damn impressive on a touch screen.
It’s impressive but maybe slightly less so than it looks at first glance. The truth is that a Bostonian with one Irish great-great-grandparent could probably swipe “pygocentrus” correctly on St. Patrick’s Day. On the other hand, Norman Shumway would’ve likely struggled in his prime to swipe “it” without it being interpreted as “out” some of the time. Take a look at what the swype pattern for “pygocentrus” actually looks like.
The swype pattern looks complex at first but it’s not that bad if you trace it out letter by letter. The complexity of the swype pattern is actually what makes it so easy for a computer to figure out what word it corresponds to. Even if you miss most of the letters by a bit there’s still no other word with a remotely similar pattern. When you swype the word “it” then it looks almost the same as “out” if you start just a little too far to the right. If you’re swyping “pot” then it looks exactly like “pout” and “pit” even if you do it perfectly.
When you try to swype one word but it’s interpreted as another it’s called a swypo. It can be pretty frustrating when these happen because you need to go back, delete the error, and then often tap type the correct word. This really breaks your flow and slows you down. We’ve also noticed that the incidence of swypos seems to go up after you’ve been swyping for a while. Once you get too comfortable it becomes really easy to cut a corner a little too sharply or to shift a whole pattern a bit to the side.
It usually starts small; with little mistakes that aren’t that big of a deal. You know that your friends will be able to tell you mean “stories” when you say “sorties” so you don’t stress. Then you start convincing yourself that “airtight” is just cool new slang you invented that means “alright.” But before long you’ll start texting “it’d a nice say toast” instead of “it’s a nice day today” and you’ll get concerned calls asking if you meant to say that you smell toast. Once it gets this bad you might as well be speaking with a British accent for how well people can understand you.
So what can you do? Other than the obviously unacceptable answers of looking at the screen while you’re swyping or trying to be less sloppy… nothing really. Nothing at all. Unless… is it possible that a different keyboard layout could alleviate the problem?
A major cause of swypos is that there are inconvenient clusters of letters on a QWERTY keyboard, like “uio,” which lead to a lot of ambiguity. Also, some of the least frequently used letters, like “q” and “z,” are off in the corners while they would probably be more useful in the center to help separate the most commonly used keys. Just how much better suited for swyping could a keyboard be if you arranged the keys more optimally? If that’s a question that you want to know the answer to then you’re reading the right blog post.
Back in the Day…
“Turn your pagers to 1993.” -Christopher Wallace
Before we really dig into swype we’re going to take a little detour to explore its spiritual predecessor, T9. This will give me a chance to explain our general methodology without getting too caught up on some of the subtle points that only apply to swype. There’s also some interesting history here that’s hard to pass up on and relevant to swype. If you already know what T9 is and you’re not in the mood for a history lesson then feel free to skip ahead to the next section.
The first incarnation of phone to phone SMS as we know it today was introduced in 1993 by Radiolinja in Finland. The idea of piggybacking 128 byte messages on the signaling paths used to control telephone traffic on GSM networks has been around since 1984, but it wasn’t until 1993 that Nokia made the first phones which supported sending and receiving them.
By 1995, the technology had become ubiquitous but the average GSM user was still only sending on average 0.4 messages per month. It would be good for the narrative to say that this was because the text input methods were so inefficient but in reality it’s probably more due to the complexity of the billing system in use at the time. The text input was really inefficient though, that part’s true.
Remember how telephone keypads have one number and a few letters on each button? For example, the 2 button also has “ABC” on it and the 8 button has “TUV.” To send a text message back then you had to rotate through the letters by hitting each button repeatedly. To type the word “cat” you would have to tap 2-2-2-pause-2-8. It was slow and a pain to use.
So anyway, in 1995, when SMS was really still in its infancy, Cliff Kushler and Martin King had a brilliant idea. That idea was to form a company called Tegic Communications and immediately file a patent for T9, short for Text on 9 Keys. With T9 you just press each letter on the keypad once while you’re typing instead of cycling through the letters on each key. Going back to the “cat” example it would be shortened to 2-2-8 (although on some phones you would have to pause before the subsequent 2s).
An interesting historical footnote is that the idea for T9 had already been around for a full decade by the time Tegic formed and the patent was filed. I’ve investigated this thoroughly and believe that the seminal paper was “A Simplified Touch Tone Telecommunication Aid for Deaf and Hearing Impaired Individuals” by Scott Minneman from Tufts Medical Center. This paper describes the text entry method as well as two disambiguation algorithms, and it was published in the proceedings from The 7th Annual Rehabilitation Engineering and Assistive Technology Society of North America (RESNA) Conference in 1985.
There’s no way that Kushler and King had any idea about some obscure proceedings from a conference that you’ve probably never heard of though, right? Well… they were both doing research on computer input mechanisms for disabled people before they started Tegic. They’ve also both presented their own research at RESNA conferences. It’s not like there was only one obscure presentation on the topic either.
Minneman and Stephen Levine, a frequent collaborator, wrote a series of papers in the 1980s exploring different disambiguation algorithms and optimizing keyboard layouts to reduce errors and increase the input rate. There was even a 198 page dissertation written on the topic by Chandravalee Iyengar, who I assume was Levine’s graduate student, in 1988 called “Development of a Multicharacter Key Text Entry System Using Computer Disambiguation.” This collective work spawned a small subfield and by the time the patent was filed in 1995 there had been dozens of papers exploring every detail of these types of keyboards.
Ah, but you have to read all the claims of the patent and not just the abstract. Yeah, I did. And I’ve read dozens of research papers on disambiguation keyboards on microfiches that I had to dig out of dusty boxes in library basements. So, yeah… I have a lot of respect for Kushler and King for realizing that the real money in T9 was in facilitating flirting instead of helping deaf people. That was actually a pretty big insight in 1995, but probably not one that should be patentable.
Optimizing the T9 Layout
Getting back to the fun stuff: T9 text entry is a lot like swype. It sacrifices a bit of accuracy by allowing ambiguous inputs and in doing so gains a lot of speed. So we can take our initial question of how much we can improve the text input rate by rearranging keys and apply it to T9 instead of swype. It’s a very similar problem but it’s a lot easier to solve so we can use it as a stepping stone while explaining the general optimization approach.
So how do we optimize a keyboard layout? We first need to quantify what we’re trying to optimize. You can get fancy and start talking about the time it takes to move your finger around but the biggest bottleneck with T9 is generally having to go back and fix things that weren’t interpreted correctly. We quantify this with the error rate which is just the percentage of words you enter that are misinterpreted. We’ll also refer to the efficiency of the keyboard which is the percentage of the time that a word is correctly interpreted (one minus the error rate).
In order to calculate the error rate we’re going to need a way to simulate realistic user input on a keyboard and then to interpret it. We built an open source library called dodona which we’ll be using to do these things. The core of the library is written in c++ but we like to interact with it using python and Jupyter. We’ll include python code as we go to help explain some of the concepts.
# Import the necessary modules from dodona import core, keyboards, wordlists # These are a couple of ipython specific things for plotting and numerics %pylab %matplotlib inline # Create a keyboard and an interaction model t9_keyboard = keyboards.MakeT9Keyboard() t9_model = core.SimpleGaussianModel() # Set how random the model is relative to the key sizes t9_model.SetScale(0.2) # Use the model to generate an input vector for the word "cat" vector = t9_model.RandomVector("cat", t9_keyboard) keyboards.DrawKeyboard(t9_keyboard, inputvector=vector, t9=True, figsize=(5,5))
The circles in this plot again indicate touch events but now the color indicates time with yellow corresponding to the first touch and red the last. We call sets of touch events an input vector which we represent as a sequence of (x, y, t) values. Input vectors are generated by input models which simulate how a user would enter words on a keyboard. The model we’re using here chooses a random location for each key press according to a Gaussian distribution centered around a given key.
Now that we have a way to simulate user input we still have to figure out how to interpret it. One way to disambiguate inputs is to use a dictionary of valid words. The simplest approach is to then just cycle through every word in the dictionary and pick the word with the highest likelihood of generating the observed input vector. This is the general approach that SimpleGaussianModel uses but it also limits the search to words of the same length.
To build our dictionary we used the Google Web Trillion Word Corpus as compiled by Peter Norvig. This corpus contains approximately 300,000 of the most commonly used words in the English language and their frequencies. Unfortunately, more than half of these are misspelled words and abbreviations. To get rid of these we cross checked against The Official Scrabble Dictionary, the Most Common Boys/Girls Names, and WinEdt’s US Dictionary, including only words that appeared in at least one of them. This left us with 95,881 words or about five times the vocabulary of an average adult.
Now, using this dictionary, we can disambiguate input.
# Load in the wordlist from disk wordlist = core.WordList() wordlist.LoadFromFile('wordlist.dat') # Truncate it to the 5000 most common words wordlist = wordlists.MostCommon(wordlist, 5000) # Make a new cat vector and find the best match vector = t9_model.RandomVector("cat", t9_keyboard) print("Reconstructed as:", t9_model.BestMatch(vector, t9_keyboard, wordlist))
Reconstructed as: act
Oops, we interpreted it as “act” instead of “cat.” If “c” and “a” were on different keys then we wouldn’t have had this problem.
In order to quantify the overall frequency of errors we generate words randomly based on their frequency of usage and then compute the fraction of the time that a word is misinterpreted. If our model is random then this is only a statistical estimate of the error rate but, because the errors in T9 are almost entirely due to the keyboard layout itself, we can cheat a little and compute it directly.
# Turn off the randomness t9_model.SetScale(0) # Find the efficiency result = core.ExactEfficiency(t9_keyboard, t9_model, wordlist) print('Error rate:', 1-result.Fitness())
Error rate: 0.04062902967123838
So the error rate is about 4% for the normal touchtone layout with a dictionary consisting of the 5,000 most commonly used words. Now let’s see how that compares to the average error rate for randomly selected keyboards.
from copy import deepcopy error_rates =  random_keyboard = deepcopy(t9_keyboard) for i in range(1, 1000): random_keyboard.Randomize() fitness = core.ExactEfficiency(random_keyboard, t9_model, wordlist).Fitness() error_rates.append(1-fitness) print("Average error rate:", np.mean(error_rates)) print("Standard deviation:", np.std(error_rates))
Average error rate: 0.0618525265595 Standard deviation: 0.0167464493527
So the efficiency of the standard touchtone layout is actually 1.27 standard deviations above average.
How could we go about finding a more optimal arrangement of the keys? One simple approach is to swap a few random pairs of letters to get a similar but random keyboard and then reevaluate the efficiency. If it’s greater than it was before the swaps then we keep the new keyboard and otherwise we keep the old. If we do that repeatedly then we’ll end up with a keyboard with a lower error rate.
We repeated this exact process a few times to illustrate how the optimization progresses. For fun we also decided to see how bad of a layout we could come up with. In the image below you can see how the error rates progressed with iterations. The blue and the green represent optimizations for low and high error rates, respectively. The light lines are individual optimization progressions while the dark lines are the average over the ensemble.
The best out of these had an error rate of 1.72% which is 58% lower than the standard “abc” layout. The worst had an error rate of 21.7% which is over 5 times higher than the “abc” layout. Clearly the performance of the keyboard is hugely dependent on the arrangement of the letters. Taking a look at these two layouts we can gain some insight to why the performance is so different.
One of the first things we notice is that the really bad keyboard has the four most common vowels all on one key. Some of the other characteristics are more subtle but become apparent when we look at some of the most common errors for each layout.
|Worst Layout||Best Layout|
More than half of the time that an error occurs on the worst keyboard it’s the result of one of these ten most common errors. From these we can see that having “tsr” together and “nf” are also problematic. On the best keyboard these groups of letters are all separated which eliminates any ambiguity for these extremely common words. The error rate does depend on all of the words in the dictionary, so the full picture is more complicated than this, but these common words are a dominant factor.
Optimizating a Keyboard Layout for Swype
T9 will always have a special place in my heart but it’s probably a pretty safe bet to assume that the future of text entry doesn’t involve a touchtone keypad. It seems like every kid I see these days is glued to their iThis or their Razzr Thatt HD+ and there’s no sign that this is about to change. People aren’t just writing 160 character text messages anymore either; now they’re writing whole Tweets, emails, and god knows what else. I even typed up this entire blogpost on my phone just because my laptop was upstairs. A lot of time could be saved and frustration avoided if touchscreen keyboards were more optimal.
“More optimal” could of course mean a lot of different things. We’re going to use the same criteria for being optimal that we did for T9: having the lowest rate of interpretation errors in typical usage. This probably wouldn’t be the best quantity to optimize if we were dealing with tap typing, but it’s a pretty good way to find faster and less frustrating keyboards with inherently ambiguous input methods like T9 and swype.
We’ll also take the same general approach to computing the rate of errors that we did in the T9 analysis. We’ll build a model of typical user input, try to interpret it, and then see how frequently we reconstruct it to be the wrong word. Then we can repeat this for different keyboard layouts and see how they compare.
Modeling Swype Input
Modeling how users input words on T9 was pretty easy. We basically just had them tap once in the center of each key. With swype it’s a bit more complicated because the personal style and sloppiness of each user plays a major role in the ambiguity. Although there are a number of inherently indistinguishable swype patterns on a QWERTY keyboard, the majority of errors are the result of there being words with similar patterns that become hard to distinguish in realistic usage.
To model the random elements of individual swypes we first choose control points for each letter in a given word. These points are chosen according to correlated Gaussian distributions around each key’s center. We then interpolate between these control points in various ways and sample along the path so that every swype consists of the same number of discrete touch events. This allows for a wide variety of realistic swype input, capturing both differences in personal style and sloppiness. Below you can see how a handful of random swypes look for the word “cream”.
word = "cream" # Make a standard QWERTY keyboard keyboard = keyboards.MakeStandardKeyboard() keyboard.RemoveKey('.') # Create an interpolation model and a perfect vector model = core.SimpleInterpolationModel() perfect = model.PerfectVector(word, keyboard) # A few possible interpolations interpolations = [core.SpatialInterpolation, core.CubicSplineInterpolation, core.MonotonicCubicSplineInterpolation, core.HermiteCubicSplineInterpolation] # Plot ten random vectors for the word for i in range(10): model.Interpolation = np.random.choice(interpolations) vector = model.RandomVector(word, keyboard) keyboards.DrawKeyboard(keyboard, perfectvector=perfect, inputvector = vector, frequencymap=word, colormap = mpl.cm.Reds, nopalette=True)
The blue line signifies what we call the perfect vector. This is a swype pattern that represents the ideal input vector for a word in the absence of any randomness or user bias. For our swype models, this consists of a linear interpolation between the centers of the successive keys spelling out the word.
You might be wondering how much the parameters of the models affect the error rates that we ultimately determine. The answer is quite a bit, but this isn’t as big of an issue as it seems at first. Clearly a user who is more sloppy with their inputs will encounter more errors than somebody who is very careful with their swypes. The frequency of errors will be different but, despite this, the words that are misinterpreted will tend to be the same. We spent a lot of time studying the systematics of this and have found that the relative error rate between keyboards tends to be mostly independent of the model parameters. This means that statements like, “this keyboard resulted in a 51% reduction in the error rate relative to QWERTY,” are broadly applicable even if the error rates themselves aren’t.
It’s necessary to have an algorithm for interpreting swypes in order to quantify the error rate. With T9 we could tell directly if a word matched an input vector as long as we were willing to turn off randomness in the model. The randomness in swype inputs is fundamentally important so we can’t simply turn that off anymore. We instead need a way to quantitatively estimate how similar an input pattern is to any given word. If we can do that then we can just cycle through our dictionary and pick the word that’s the best match.
For our tap typing models we compute the posterior probability for each word directly but that’s not really feasible with the swype models due to the complexity of the interpolations. We instead measure the similarity between a swype pattern and the perfect vector for each word. One simple approach to this is to take the Euclidean distance between two input vectors. This is actually the default implementation for interpolation models in dodona and it does a pretty decent job of reconstructing words.
# Make a new cream vector and find the best match vector = model.RandomVector("cream", keyboard) print("Reconstructed as:", t9_model.BestMatch(vector, keyboard, wordlist))
Reconstructed as: cream
It worked in this case and works well overall, but there are some cases where it gives quantitatively large distances between swipes when they’re quantitatively very similar. In particular, it tends to give large distances between patterns that are slightly offset but still pass over all of the same keys. Unrealistic quirks in the matching algorithm are something that we really want to avoid because optimization tends to find and exploit things like that.
Neural Network Identification
To improve the performance of the algorithm we trained a neural network to take in eleven comparison measures between swype patterns and then identify whether or not they correspond to a pair of random and perfect vectors for the same word. The eleven comparison measures relate to the differences in length, x/y positions, x/y derivatives, x/y second derivatives, and the first/last x/y positions of the two swipe patterns. We go into a lot more detail about them, and the network in general, in the paper.
Below we can see an example of how the network responds to a match and a non-match. The value for each node is represented in grayscale with black corresponding to the largest value and white to zero. The magnitude of the signal being passed through each connection is represented by it’s transparency with solid connections having the largest signal strength. Finally, red corresponds to positive weight sand blue to negative weights. The swipe patterns being compared in the top were both generated from the word “phish” using two different interpolations and the network successfully identifies them as being a match. The swipe patterns compared in the bottom correspond to the words “alright” and “airtight.” These words have very similar swype patterns but the network is still able to determine that there is not a match.
It’s clear in these cases that the first and second derivatives play a significant role in differentiating and matching similar swype patterns. This makes a lot of sense because they’re invariant to translation and the Euclidean distance tends to punish small translations very harshly. The neural network reduced our overall error rate by about 20% and helped eliminate nearly all unrealistic mismatches.
Limiting Candidate Words
Generating and comparing swype vectors is much more expensive than doing the same with T9. With T9 we were also able to reduce the number of possible words by only looking at words of the same length and stopping the search once we found the most probable direct match. We can’t get any reliable information about the number of characters in a swype pattern so we can’t apply the same tricks. So what can we do to speed things up?
A general approach is to only consider words that have a reasonably high probability of producing the pattern in question. One way to accomplish this is to work directly with the series of letters that are passed over by a swipe pattern. For example, a perfect swype of the word “pot” would correspond to “poiuyt” on a QWERTY keyboard. We call this discrete representation of a swype pattern the string form. We could then simulate a large number of random swypes for each word in the dictionary and build a hash table which maps string forms to a list of words that could result in each string form. Using this table we could quickly produce a short list of probable candidate words for any swype pattern and then evaluate each of these more carefully.
This would work well in a practical implementation of a swype keyboard, but it has a major issue in the context of optimization. Producing the initial hash table is very expensive and it has to be redone for every new keyboard layout. If we want to quantify the error rate for many different keyboards during optimization then this is going to end up being slower than just cycling through every word in the dictionary.
We get around this by cheating a little. In a simulation, we know what the correct word is for every swype pattern that we’re trying to match. We use this to produce a bunch of random swypes for the correct word and then compute the string form for each of them. The string forms are then searched for words that are contained within them and also match the first and last letters. It’s easiest to explain what we mean by “contained within” with an example.
['poot', 'pout', 'pott', 'pot', 'pit', 'putt', 'put']
The candidate words are guaranteed to include any word with a perfect vector having the same string form. They can also be found very efficiently by recursively searching a radix tree representation of our dictionary. Below you can see an example of a radix tree containing the words that would match the string form for “poiuyt”.
This approach might seem like we’re abusing our knowledge of the correct word, but it typically results in a superset of the candidate words that the hash table method would produce. The main assumption is that if a random vector from one word can look very similar to the perfect vector of a second word then a random vector from the second word can look very similar to the perfect vector from the first word. If that holds, and it almost always does, then candidate words from the hash table method will also be produced by this method. It produces realistic lists of candidate words for arbitrary keyboards while speeding up the error rate calculation by over two orders of magnitude when using our full dictionary. This difference is hugely important when performing the optimization.
There are 26! ≈ 4×1026 possible keyboard layouts to consider, even when we limit ourselves to permutations of the letters. If we could calculate the error rate of one layout per nanosecond then it would still take longer than the current age of the universe to explore the full state space. Needless to say, it’s simply not possible to test every keyboard. We can still do our best to find significantly improved keyboards though.
In the T9 section we employed a random walk optimization. For the swipe optimization we use a similar approach but gradually reduce the number of random swaps over time so that the keyboard settles into a local minimum. The number of swaps in this procedure is analogous to the temperature in a physical or simulated annealing process. An illustration of how it evolves a keyboard layout to have a lower error rate is shown below.
The blue shading of the keys here indicates how frequently each letter is used in our word list. When you drag the slider you can see how the more frequently used keys tend to move towards the outside as the optimization progresses. We ran 256 optimizations like the one above, each starting with a unique random keyboard. Below you can see the average error rate at each iteration as well as the minimum and maximum.
You can see from this plot that the average error rate of the random starting points is about 13%. QWERTY is close to average but slightly worse with an error rate of 15.3%. The best and worst keyboards that we encountered had error rates of 7.7% and 27.2%. These correspond to a 51% reduction in error rates and a 78% increase in error rates relative to QWERTY. As we mentioned before: the absolute error rates don’t mean a lot by themselves but their ratios are fairly robust. Things could be a lot worse than QWERTY but they could also be a lot better.
Looking at the best and worst keyboards we find that they have a bit in common with the optimized T9 layouts that we looked at. The worst T9 layout had “eiao” all on one key and the worst keyboard here has those clustered together near the center of the keyboard while they’re very spread out on the best keyboard. The clustering of “ts” and “nf” had also been an issue with T9 and we find that they’re both separated by a full row here in the best keyboard. This shouldn’t really come as much of a surprise because these have to do with common words that only differ by a letter.
We can also see that the center keys are occupied by less frequently used letters. These keys are swyped over very frequently and we can reduce the ambiguity in long swypes by populating them with letters that don’t often occur in the middle of words. There are a few other trends to pick out but it really comes down to a complicated interplay between the structure of the keyboard and the English language, especially as you push the error rate lower and lower.
So does any of this matter at all? Apple and Google probably aren’t about to ship new keyboard layouts that are optimized for swype. Even if they did, most people probably wouldn’t want to use them. There are lots of third party keyboards available though, so somebody could make one that lets adventurous users choose their own layouts (if you do this, let us know!). We are talking about smart phones here afterall.
Really though, we just thought it was a fun problem to explore. As touchscreens become increasingly pervasive it feels like people are more and more ready for a paradigm shift in terms of entering text. Rearranging the keyboard isn’t going to be that shift but we think it’s important that people think about what might be. Hopefully dodona can help some other people do that, it certainly helped us.
If you made it this far: thanks for reading!