Keyboard tests

used with Keyboard Layout Analyzer

16 May 2017

KLA provides three tests by default:

  1. Alice in Wonderland, chapter 1.
  2. List of the most common words in English
  3. The SAT words

There are some 'problems' with this corpus.

  1. The spelling is American ... so color instead of colour, and catalog instead of catalogue, etc.
  2. In Alice, the text uses single quotes for dialogue (American style), as opposed to British style double quotes.
  3. In Alice, there are many "unusual" contstructs, such as words in ALL CAPS, and an abnormal amount of exclamation marks, parentheses, and hyphens.
  4. The frequency of the letters in Alice is different to the general frequency in English.
  5. The Common Words are not balanced... eg "the" is very common, but it's only in the list once. So the list is an unnatural text to type.
  6. Similarly with SAT words, there are many relatively obscure words that you will not type in daily use, some of which I admit I have not seen before.
  7. The common and SAT words are all lowercase, with no punctuation.

I have added the following tests:

English

Prose

  1. Classic collection, which is the first 100 lines or so of chapter 1 of the following classics, lifted from Project Gutenberg:
    • The Picture of Dorian Gray, by Oscar Wilde
    • Dracula, by Bram Stoker
    • A Scandal in Bohemia, by Arthur Conan Doyle
    • The Adventures of Huckleberry Finn, by Mark Twain
    • The Jungle Book, by Rudyard Kipling
    • Metamorphosis, by Franz Kafka
    • Moby Dick, by Herman Melville
    • Peter Pan, by J.M. Barrie
    • Pride and Prejudice, by Jane Austen
    • Romeo and Juliet, by William Shakespeare
    • A Tale of Two Cities, by Charles Dickens
    • Tarzan of the Apes, by Edgar Rice Burroughs
    • The Brothers Karamazov, by Fyodor Dostoyevsky
    • The Wonderful Wizard of Oz, by L. Frank Baum
    • The Adventures of Tom Sawyer, by Mark Twain
    • Ulysses, by James Joyce
  2. Putin's speech to the U.N..
  3. Jonathan Livingston Seagull, by Richard Bach, borrowed from the Russians. Link found via Google. Uses doublequotes for dialogue.
  4. The Little Prince, by Antoine de Saint-Exupery, saved as txt from here, top and bottom matter removed. Link found via Google. Also, the dialogue was set in italics, so Firefox saved all that like /"That is a hat."/, which will favour layouts with the / unshifted. Doublequotes for dialogue.
  5. Animal Farm, by George Orwell, borrowed from the Australian Project Gutenberg site. Link found via Google. Top and bottom matter removed. Uses doublequotes for dialogue.
  6. The Scroll marked II from The Greatest Salesman in the World, by Og Mandino, borrowed from the Internet Archive. Link found via Google. Has excessive whitespace which I should clean up, but didn't, to make the tests repeatable for you.
  7. A Message to Garcia, by Elbert Hubbard.
  8. Part 2 of The Magic Story, by Frederick van Rensselaer Dey.
  9. The War Prayer, by Mark Twain.
  10. As a Man Thinketh, by Dr. James Allen.
  11. The first test from Sean Wrona's typing championship, found here.

Miscellaneous

  1. The Tao te Ching / Daode Jing by Lao Tzu, as translated by Charles Muller, here. Top and bottom matter (including contents) removed, and cleaned up a bit ... removed horizontal lines, Chinese text, some excess carriage returns. Converted typographical quotes to ANSI quotes (single and double).
  2. A collection of famous poems, pasted into one file. Titles are
    • In Flanders Fields by John McCrae
    • If by Rudyard Kipling
    • Daffodils by William Wordsworth
    • Sonnet 18 by William Shakespeare
    • The Soldier by Rupert Brooke
    • The Tyger by William Blake
    • Jabberwocky by Lewis Carroll
    • How Do I Love Thee? by Elizabeth Barrett Browning
    Contains some unusual words and phrase construction, and general structure is not like prose.
  3. The Universal Declaration of Human Rights.
  4. Some quotes by famous people, from the collection copyrighted by Prof Dr. Gabriel Robins, here. Copied as text, top and bottom matter removed, text cleaned up a bit (fixed " etc.). Contains more parentheses, dashes, whitespace and capitals than normal prose.
  5. The most common bigrams, trigrams and quadgrams in English, processed as explained in the PPTT test below. The bigrams and trigrams lists were several hundred long, while the quadgrams was just the top 20, as that is what I could find, and it was already down to less than 1% frequency by that point.
  6. The most common pentgrams, hexgrams, septgrams, octgrams and nongramsin English, processed as explained in the PPTT test below. This was from a different source to the bigrams/trigrams/quadgrams, and included the top 50 in each list. However there was no frequency data, so the methodology simply assigned the same frequency to each string, and scaled the count according to the rank.
  7. All the '*gram' tests above, with the spaces removed. For example, "th" is very common (the, with) but it's not always at the start or end of a word (within). So including spaces effectively makes the test 1 letter longer, with heavy emphasis on the space key, which is NOT what we are actually trying to measure. Links: 2, 3, 4,.5, 6, 7, 8, 9.
  8. The United States Declaration of Independence.
  9. The Magna Carta (English).
  10. Pangrams. You know that quick brown fox? Big fjords vex quick waltz nymph.
  11. Lyrics from the musical South Pacific. Contains some French. Found all over the Internet.
  12. Lyrics from some famous songs from the 60s and 70s, mostly. Found all over the Internet. Titles included are:
    • Slow Hand (Pointer Sisters)
    • Eternal Flame (Bangles)
    • The Sound of Silence (Simon & Garfunkel)
    • Yesterday (Beatles)
    • American Pie (Don McLean)
    • My Way (Frank Sinatra)
    • My Heart Will Go On (Celine Dion)
    • The Power of Love (Jennifer Rush)
    • Angel (Sarah McLachlan)

Words

  1. My own concoction from the top 200 most common words above. In an attempt to overcome the issues raised above, I took the top 200 most common words, and multiplied them by 201 minus their rank. So the most common word is used 200 times, next most common 199, etc, down to the 200th word being used once. Then we shuffle them nicely. Then I added a comma 10% of the time, and a period 5% of the time, and paragraphs. The result is beautiful kinda-English like "More she since on, home back way time and own between those still any by me has first, are also for some great now good so, own, also two are they of, then time from at go between. Are that by or have." I did try with the top 500 words but the result was just too large for KLA to process quickly.
  2. Some difficult words, sourced from here. These are words that are commonly misspelled. American spelling. I joined them into longish lines, so there are carriage returns, unlike the word tests from KLA. This seems to make a difference.
  3. Some medical words and terms, sourced from here. These are words which doctors and nurses need to know. However they can also feature unusual letter combinations. I took the terms, deleted the definitions and explanations, lower-cased, sorted, and joined into one long line (so no carriage returns).
  4. The 200 words which junior school children in the UK are expected to be able to spell, joined into one long line (so no carriage returns).
  5. The 1100 (actually less) words from Barrons 1100 Words. Cleaned up a bit, removed words with French characters, etc. and split into longish lines.

Non-English

  1. The Magna Carta (Latin). Just for some variety. ANSI compatible.
  2. Nkosi, sikelel' iAfrika (assorted Southern African Bantu languages). Just for some variety. ANSI compatible.
  3. Lorem ipsum as well as other dummy text, includes some artificial languages, and GeekSpeek. Just for some variety. ANSI compatible.

Programming

To get some typical programming tests, I've used the following:

  1. Google's home page, which is dense CSS and JavaScript. I pray you never have to type it. Not really English as she is spoke, either. This is possibly a bad test for a keyboard, since it appears to be the result of running code through a minimizer which replaced proper variable names with single letters. So the result is 'unnatural' and not as you would type it.
  2. All the solutions in dozens of different programming languages, to the Towers of Hanoi problem, lifted from http://rosettacode.org/wiki/Towers_of_Hanoi. I've split it into two parts, one with languages from A — M and the other with languages from N — Z. There are so many languages that it's not really viable to have a test for each, so by doing it this way, we can get a good idea of which layouts are better for programming in general. Besides, modern web-based development normally requires working in 3 or 4 different languages at the same time, anyway.
  3. As above for QuickSort A — M and QuickSort N — Z. I had to delete some code because it used too many non-keyboard characters which broke Firefox's console logging (lines were too long).
  4. Keyboard Layout Editor home page, as an example of a modern one-page web app.
  5. Conway's Game of Life, as implemented in the following languages, lifted from http://rosettacode.org/wiki/Conway's_Game_of_Life.
    Ada, C, C++, C#, Common Lisp, D, Go, Haskell, Java, JavaScript, Lua, OCaml, Pascal, Perl, Python, R, Ruby, Tcl. Mmm, that's odd... no PHP?

Digits and Punctuation

Mostly Digits

To try and get some idea of ideas for optimising the digits layout, I created some files with 500 dates, all the same, but in different formats, as well as 500 currency amounts, in different formats. These files are all one entry per line, so the Return key also gets a workout. Surprisingly, the addition of the leading zero in the dates makes a considerable difference. Yes, a lot of layouts got identical scores, because they do not attempt to optimise the numrow, and instead stick with the default QWERTY arrangement. But this is not optimal, since the most used digits overall (not just dates and phone numbers) include 0 and 1, and QWERTY puts both of those way up in the pinky corners. Not good. The different formats highlight the importance of number-friendly punctuation being convenient... this is comma, period, dash, colon and solidus, and sometimes plus and apostrophe, depending on where you are.

Punctuation

  1. The P P T T (Programming Punctuation Torture Test). I took the average frequency data from Xah Lee's analysis of punctuation in an assortment of modern, popular languages (C, C++, Java, Bash, Perl, PHP, Python, Ruby, Javascript, CSS) (spreadsheet here), and much like the Exploded Words test above, created an array with the character present 100 times its frequency. So that gave us 1443 commas, 855 dashes, etc. Then I shuffled nicely and printed it out, adding a carriage return 4% of the time.
    The result produced lines looking like this:
    {&(-(_*,";,.):=&';_,,"),,}'".=}{.%(()'.'_.",$)(,|&<:{"'__"':/_"=['
  2. The programming tests above, and stripped out the alpha characters, leaving just the digits and punctuation.
  3. The second test from Sean Wrona's typing championship, found here. This contains awkward punctuation and is indeed designed to be difficult to type.

Other

  1. All the characters on a standard ANSI-104 US keyboard. This is so that I can check a particular layout for missing characters which could affect the scores.