# Foundations of Programming

Reach beyond the basics of coding with these curated resources for programmers who have completed 1-2 computing courses. You’ll find three sample paths: a basic (and recommended) one, a problems first path, and one that focuses on testing and debugging. You can also plot your own sequence to suit however and wherever you’re learning.

View all paths

## Recommended sequence

Hone your problem-solving skills with this former Google interview question focused on strings. It'll test your ability to optimize code and work with algorithms and data structures.

Take the challenge

#### The Challenge

Given a string `S` and a set of words `D`, find the longest word in `D` that is a subsequence of `S`.

Word `W` is a subsequence of `S` if some number of characters, possibly zero, can be deleted from `S` to form `W`, without reordering the remaining characters.

Note: `D` can appear in any format (list, hash table, prefix tree, etc.

For example, given the input of `S = "abppplee"` and `D = {"able", "ale", "apple", "bale", "kangaroo"}` the correct output would be `"apple"`

• The words "able" and "ale" are both subsequences of S, but they are shorter than "apple".
• The word "bale" is not a subsequence of S because even though S has all the right letters, they are not in the right order.
• The word "kangaroo" is the longest word in D, but it isn't a subsequence of S.
• ##### Learning objectives

This question gives you the chance to practice with algorithms and data structures. It’s also a good example of why careful analysis for Big-O performance is often worthwhile, as is careful exploration of common and worst-case input conditions.

#### Solution explained, commentary

There are a number of different viable approaches (for example, there’s more than one algorithm that applies). This means you can really put your problem-solving ability to the test and build on what you know.

Some solutions:

##### Brute force

Generate all 2^n possible subsequences, and check each one against the dictionary for a match. A slight optimization is to at least ensure the dictionary is represented by a hash table or prefix tree so that lookups are efficient.

##### Check each dictionary word using a greedy algorithm

You can prove that a greedy algorithm works for checking if a word `w` is a subsequence of `S`. Scan `S` from the beginning for `w`. After you find it, continue scanning from that point for `w`, and so on, until either you run out of characters in `S` (`w` is not a subsequence), or you find all the characters in `w` (`w` is a subsequence).

You could sort the dictionary words in order of descending length, run the above decision procedure for each word, and take the first word that is a subsequence of `S`. The running time is `O(N*W)` where `W` is the number of words in `D` and `N` is the number of characters in `S`.

`O(N*W)` is asymptotically optimal if most dictionary words are close to size `N` (because then the size of the input is `O(N*W)`). However, it is far from optimal in a worst-case scenario where `S` may be long and the dictionary strings quite short.

Define `L` to be the total number of characters in the dictionary over all words. Then the best possible time complexity (assuming words shorter than `S`) is `O(N + L)`. However, the existing algorithm is `O(N * W)`, which might be `O(N * L)` in the case where dictionary words are a small constant in size.

##### Improving the greedy approach

Notice that to check a dictionary word w, you end up needing to know, for each character `c` in `w`, the least index `i` in `S` where `S[i] == c`, such that `i` is greater than some given index `j` (the index of the previously-matched letter). Observe that the greedy algorithm is slow precisely because it naively scans for this index `i`.

You can preprocess `S` to find such indices much faster. Build a map of letter -> sorted list of indices where letter occurs. For example, if `S == "abppplee"`:

Java Copied
``a -> ``b -> ``p -> [2, 3, 4]``l -> ``e -> [6, 7]``

When you need to know "Given my last matched character was at index X and my next character to match is Y, where does this match occur?", you would look up `Y` in the map, and then binary search the occurrence list to find the smallest number > `X` in that list, or report that none exists (the word is not a subsequence then).

This approach requires one binary search per dictionary letter. Since it takes only `O(N)` time to preprocess `S` in this way, the total complexity is `O(N + L * log N)`, which is quite close to the best possible complexity.

Advanced approach: You might note that since this is a classic "successor-value" problem, specialized data structures exist that can run the search even faster. A vEB tree, for example, improves the runtime to `O(N + L * log log N)`.

##### An optimal O(N + L) approach for small alphabets

If the size of the alphabet is a-z or a small constant, we can give a solution based off the previous idea that runs in `O(N + L)`.

The real complexity is `O(N*A + L)`, where A is the size of the alphabet. Instead of making the representation sparse, you could make it dense. Instead of `p -> [2, 3, 4]` you could use `p -> [2, 2, 3, 4, -1, -1, -1, -1]` where the i-th index directly yields the answer to the query. This requires `O(N)` space per alphabet letter (O(NA) total) and also that much time during preprocessing. It is therefore not suitable for large alphabets.

##### An optimal O(N + L) approach for any alphabet

Instead of processing the words one at a time, we will process all words simultaneously.

First, for each word `w` in `D`, create a 2-tuple containing `w` and the number `0` (i.e. `(w, 0)`). The number refers to the number of characters in `w` that have already been found in `S`; initially, no characters have been found. For a tuple `t`, we'll refer to the word as `t.w`, and to the number as `t.i` (since it is an index).

Group the words by `t.w[t.i]` (initially `t.i` is `0`, so initially it is by first letter). For our example dictionary `D = {"able", "ale", "apple", "bale", "kangaroo"}`, we’ll have:

Java Copied
``a -> [("able", 0), ("ale", 0), ("apple", 0)]``b -> [("bale", 0)]``k -> [("kangaroo", 0)]``

What this is doing is telling you what words you will make progress on finding for each possible letter that you might see next in `S`.

Scan `S` one character at a time. If the character you just scanned is `c`, for each tuple `t` in `map[c]`, increment `t.i` by 1 and transfer `t` to `map[t.w[t.i]]`. The case where `t.i == t.w.length` is special - transfer these words to a "found" list. In the end, the output is the longest word in the found list.

#### Code snippet for solution

Copied
``#!/usr/bin/env python``import collections``import sys``def find_longest_word_in_string(letters, words):``    letter_positions = collections.defaultdict(list)``    # For each letter in 'letters', collect all the indices at which it appears.``    # O(#letters) space and speed.``    for index, letter in enumerate(letters):``        letter_positions[letter].append(index)``    # For words, in descending order by length...``    # Bails out early on first matched word, and within word on``    # impossible letter/position combinations, but worst case is``    # O(#words # avg-len) * O(#letters / 26) time; constant space.``    # With some work, could be O(#W * avg-len) * log2(#letters/26)``    # But since binary search has more overhead``    # than simple iteration, log2(#letters) is about as ``    # expensive as simple iterations as long as ``    # the length of the arrays for each letter is``    # “small”.  If letters are randomly present in the``    # search string, the log2 is about equal in speed to simple traversal``    # up to lengths of a few hundred characters.              ``    for word in sorted(words, key=lambda w: len(w), reverse=True):``        pos = 0``        for letter in word:``            if letter not in letter_positions:``                break``        # Find any remaining valid positions in search string where this``        # letter appears.  It would be better to do this with binary search,``        # but this is very Python-ic.``        possible_positions = [p for p in letter_positions[letter] if p >= pos]``        if not possible_positions:``            break``        pos = possible_positions + 1``        else:``            # We didn't break out of the loop, so all letters have valid positions  ``            return word``if __name__ == '__main__':``    print subdict(sys.argv, sys.argv[2:])``