Advanced Learning Path
This path is meant for those on the cusp of graduating or who are self-taught to a high level. You’ll integrate what you've already learned and polish your software engineering skills. You’ll also learn tricks of the trade that will help you program more efficiently, and gain experience using open source code and other tools industry professionals rely on.
Practice problems first
In this exercise, you're going to decompress a compressed string.
Your input is a compressed string of the format
number[string] and the decompressed output form should be the
number times. For example:
Would be output as
Number can have more than one digit. For example,
10[a] is allowed, and just means aaaaaaaaaa
One repetition can occur inside another. For example,
2[3[a]b] decompresses into
Characters allowed as input include digits, small English letters and brackets
Digits are only to represent amount of repetitions.
Letters are just letters.
Brackets are only part of syntax of writing repeated substring.
Input is always valid, so no need to check its validity.
This question gives you the chance to practice with strings, recursion, algorithm, compilers, automata, and loops. It’s also an opportunity to work on coding with better efficiency.
Solution explained, commentary
A basic solution involves use of simple recursion and loops.
Alternatively you can try a “peel the onion" approach, where you first decode top layer without using recursion, and then if there is any repetition group left, do the same with output after first phase. This is not bad, but the first approach is better as it may use a lot less memory.
There are several catches, for example:
Once you've tried the problem, try thinking through these questions -- and maybe even write more code to push your skills a step further.
1. How would you test your solution in light of you've practiced in this path?
2. What are some different approaches you could take to your solution -- and how would you test those?
Code snippet for solution
def decomp(text, start=0, times=1):
Iterate over and decompress the compressed string.
This approach makes use of nested Python iterators, which is a very clean way
of expressing expansion of nested iterated items.
text: The entire string to decompress. It's unobvious, but nice
to have the whole string plus an index; this allows any error
detection code to report the absolute index of a problematic
start: The starting index within 'text'. We decompress from
'start' up through the matching close-brace or end-of-string.
times: The number of times to repeat decompression.
# Repeat iteration over our sub-chunk N times.
for _ in xrange(times):
i = start
# Until we hit the end of the string, or end of our chunk...
while i < len(text) and text[i] != ']':
# Emit letters as themselves.
# If it's not a letter, it must be digit(s). Convert to integer.
sub_times = 0
sub_times = sub_times * 10 + int(text[i])
i += 1
i += 1 # Skip past open-bracket
# Start an iterator over the sub-chunk.
for item in decomp(text, i, sub_times):
# Iterator generates many characters, and then at the very end,
# it generates an integer. Provide characters up to our caller,
# and use the integer to advance our index 'i' to end-of-chunk.
if isinstance(item, basestring):
i = item
# Advance 'i' to the next letter, or skip the close-bracket, whichever.
i += 1
# Don't emit the trailing integer if we are doing the outermost call. This
# test could be moved to the decompress() call instead; we would check there
# to see if the result item was basestring or int, just as we do above, but
# I suspect that check would be more expensive than a simple integer
# comparison here, where the type is known.
if start > 0:
# We could wrap 'text' to add a leading '1[' and trailing ']' to allow a
# little bit of simplification in the logic in decomp(), but the
# simplification would lead to harder-to-read code, as well as requiring
# O(n) additional time, and a temporary requirement for O(n + 3) additional
# space during the copy operation.
# This is O(decompressed-length) for speed (probably), and up to
# O(compressed-length/4) for additional storage. In this implementation,
# the storage requirement is well-disguised: It appears on the function call
# stack, rather than in declared variables. E.g., consider a worst-case
# input of: 1[1[1[1[1[1[1[1[1[1[1[1[1[1[1[1[1[1[1[1[xx]]]]]]]]]]]]]]]]]]]]
# which would require a stack depth of 20.
# The (probably) for the big-O depends on how well Python implements
# resumption of nested iteration. At worst, the string above would require
# a full stack descent, then ascent for *each* of the two 'x' characters,
# for a total of O(n^2) time. Another very well-hidden possible cost.
for letter in decomp(text):
if __name__ == '__main__':