4

My son who is just learning to read loves the concept of hiding one letter of a word with his finger and asking what the rest of the word spells. His favorite is turning "her" into "he" and "then" into "the."

As he's starting to learn bigger words I want to find some words for him that are "truncatable," similar to the concept of a truncatable prime. These would be dictionary words that are also dictionary words when a character is removed from either the left or the right side. A properly "truncatable" word would continue to be a word down to 1 character, but I would be really only looking for those that are truncatable down to three letters:

 Browser
 Browse
 Brows
 Rows
 Row

What is the best algorithm to do this? My initial naive approaches are proving quite horribly inefficient for the dictionary file that I am using. I am currently literally testing every word in the dictionary, recursively, and the running time and space are intractable and the methodology ends up testing each word many times.

Is there an efficient memoized or otherwise optimized way of finding truncatable words?

Edit: Thank you to https://cs.stackexchange.com/users/755/d-w who provided guidance, this is the code I ended up coming up with which works:

# find list of truncatable words in English
# truncatable words are those where a word
# can be dropped from the beginning or end
# to form a new valid word, all the way down
# to three letters or fewer!

def main():
    # Find the truncatable words

    # read dictionary into a list
    f = open("english3.txt","r")
    wordlist = []
    for line in f:
        wordlist.append(line.strip().lower())

    # sort by length

    wordlist2=sorted(wordlist,key=len)
    max_word_length = len(wordlist2[-1])

    # set up list of dictionaries for truncatable words
    Truncatables = []
    for filler in range (max_word_length+1):
        Truncatables.append({})

    # iterate through dictionary and test for truncatability
    for word in wordlist2:
        if len(word)<=3:
            Truncatables[len(word)][word]='default'
        else:
            front = word[1:] in Truncatables[len(word)-1]
            back = word[:-1] in Truncatables[len(word)-1]
            if front and back:
                Truncatables[len(word)][word]='both'
            elif front:
                Truncatables[len(word)][word]='front'
            elif back:
                Truncatables[len(word)][word]='back'
            else:
                pass
    print(Truncatables)       

    # Define a function that recursively prints out a truncatable chain

    def printTruncatable(word):
        if word in Truncatables[len(word)] and word in wordlist2:
            print(word," ",end="")
        TruncType = Truncatables[len(word)][word]
        if TruncType == "back" or TruncType == "both":
            printTruncatable(word[:-1])
        elif TruncType == "front":
            printTruncatable(word[1:])
        elif TruncType == "default":
            print("\n")
        else:
            print("ERROR")

    # print the top truncatables
    for wordlength in range(max_word_length,7,-1):
        for key in Truncatables[wordlength]:
            print(wordlength,": ", end="")
            printTruncatable(key)



if __name__ == "__main__" :
    main()
user112052
  • 41
  • 2

2 Answers2

2

Here is a simple implementation strategy.

Find all three-letter truncatable words. Every three-letter word is truncatable, so this is a linear scan through the dictionary to find the three-letter words. Store them in a hashtable.

Find all four-letter truncatable words. You can check whether a four-letter word wxyz is truncatable by checking whether wxy is a truncatable word; the latter can be checked by looking it up in the previous hashtable. So, this can be done with a linear scan through the dictionary. Store all four-letter truncatable words in a hashtable.

Find all five-letter truncatable words, by a linear scan plus a lookup in the hashtable for four-letter truncatable words. Note that given a word vwxyz, it suffices to check whether vwxy is truncatable (you don't need to check vwx).

Find all six-letter truncatable words.

And so on. Finish when you've exhausted the length of words in the dictionary.

D.W.
  • 167,959
  • 22
  • 232
  • 500
-1

Never done it, but this might work for you. I'm using 2 characters as my threshold to generate my example because its easier cognitively, but maybe something along the lines?

  1. Run through the dictionary and discard any word less than 3 characters as 2 letter-strings are non-truncatable; then organize by size from largest to smallest by length and alphabetize each subset; this subset of the dictionary is D'.

{a, aa, aardvark, ..., an, ant, anteater, ...} ->
{aardvark, ..., ant, anteater, ...} ->
{aardvark, ..., anteater, ..., ant, ...}

  1. Use recursion and a binary tree to move through D' to generate a candidate list L from word W. Your base case would be a two-letter string. You can discard two-letter results. For example:

"that" ->
(that (hat,tha)) ->
(that (hat (at, ha)),(tha (ha, th))) ->
{(that, hat, at), (that, hat, ha), (that, tha, ha), (that, tha, th)}

Note that this would generate combinations that have non-words, so at each level of recursion, you can check D with a regex for membership. In python, this is easier with an is-in call.

with is-in check : {(that, hat, at), (that, hat, ha)}

  1. Concatenate your tuples into new dictionary of results R.

  2. Remove all entries in tuples of R from D' so you don't duplicate your work. For instance, this algorithm checks "that" before "at", "ha", and "hat", so eliminate "hat" AND "that" from D'. (D' has already been stripped of "at" and "ha".)

  3. Reiterate steps through revised D' until D' is exhausted.

R should now contain tuples of various lengths without redundancy. I'd alphabetize R because that's my nature.

J D
  • 181
  • 7