You are on page 1of 4

Data Structures & Algorithms Test Version: 1.

0 (DRAFT) Last Revised: 5/17/2004 Purpose of this Document Provides a standard set of SDE interview questions for evaluating data structures & algorithms competencies. Knowledge areas: 1. Fundamental algorithms & their complexities a. Sorting b. Searching c. Bonus points for knowing advanced algorithms of any type 2. Fundamental data structures, basic operations (& complexities) a. Array b. Lists c. Stacks & queues d. Trees e. Hash tables f. Bonus points for unusual depth of knowledge or knowledge of more obscure data structures (e.g., tries, splay trees) Skills areas: 1. Analyzing complexity of simple algorithms 2. Choosing appropriate fundamental algorithms to solve problems 3. Designing or adapting algorithms to solve problems 4. Choosing appropriate data structures 5. Designing or adapting data structures to solve problems Testing Strategy Start with simple, knowledge based questions. Evaluate depth of knowledge as appropriate with a given candidate. Test Questions 1. Sorting Tell me some different algorithms for sorting data Ideally, should know at least one of the N^2 algorithms, plus merge sort, quick sort, and radix or bucket sort. Ask complexity for each algorithm the candidate mentions. Ideally, this should be memorized. If the candidate is doing well, ask to explain why quick sort or merge sort is N log N on average. Should know complexity for each of these algorithms. (Ideally, should be memorized.) Ask for average/worst case for quicksort (ideally, candidate should volunteer this). If candidate doesnt volunteer, prompt specifically for linear time sorts. 2. Data structures Whats the complexity of finding an item in an array given its index? This is a freebiecandidate should say constant or O(1) instantaneously.

Whats the complexity of finding an item in an array given its value? Should say O(N). How can we make this faster? Should volunteer sorting the array and using binary search. Should know that this is O(N log N) to set up plus O(log N) for searching. Lets talk about linked lists. When would you prefer a doubly-linked list over a singly-linked list? Fast backwards iteration. Fast delete from the middle given an iterator. Why would you prefer a singly-linked list over a doubly-linked list? Uses less memory. How much memory does a doubly-linked list of N items occupy, if items are of size d and pointers are of size s? N* (2*s + d). Possibly also + 2s for head & tail pointers. Lets talk about hash tables. What are the basic operations on a hash table? Insert, delete, find. (Some people will add, iterate.) Lose points for mentioning the hash function as an operation on the hash table. What are the complexities of each of these operations? All constant time. Candidate should ideally volunteer that this is (1) average case, (2) conditions required for this (load factor, good hash function). If not volunteered, ask if there are any conditions you need in order to get these complexities. Tell me an example hash function for strings Dont accept MD5 as an answer. Get something simple that the candidate can actually walk you through, e.g., summing up the ASCII values of characters in a string. What are the properties of a good hash function? Fast execution, distributes keys evenly over buckets. Explain to me what a hash table looks like in memory. Should know that its fundamentally an array of indexed buckets. Ideally should volunteer open addressing vs. chaining here. Ideally should volunteer that keys and values are stored (do they get that multiple keys map to the same location?) How do you handle multiple keys mapping to the same location? (If you didnt get this earlier.) Should know chaining and open addressing. Should know different schemes for open addressing (linear probing, quadratic probing, rehashing). Bonus points for knowing multilevel hashing or perfect hashing. (if the candidate is struggling at this point, skip to binary trees.) You want to store 10,000 items in a hash table. How big should your table be? Rule of thumb is load factor .5 hash table = 2X key size. How would you manage a hash table whose size you dont know the ultimate size up front? Ideal answer: Resize the hash table. (Should know algorithm for dynamic array, should be able to apply it to hash tables.) Ask the load factors at which you decide to grow the table (1/2) and at which you decide to shrink the table (1/8). Ask what the impact is on average complexity (none). Suppose you grow/shrink by 1 each time. Whats the worst case complexity of N inserts and N deletes? (Answer: O(N^2))

Lets talk about binary trees. What are the basic operations on a binary tree? Insert, delete, find, iterate. What are the complexities of these operations? Should know O(log N). Ideally, should volunteer balanced assumption. If not, ask if conditions apply. Ask worst case--O(N). How long does it take to insert N items into a binary tree? O(N log N). Ideally, candidate will volunteer N^2 case for sorted input on plain vanilla trees. If trees are log (N) for basic operations while hash tables are O(1), why does anyone ever use trees? Preserves ordering of data (allows iteration, ranged searches). Can require less memory than hash tables. 3. Analysis & Applications Choose easy/medium/hard depending on interview performance to date. Easy question: How would you dedupe a list of strings? Various answers possible, e.g., use a hash table. Test if each string is in the hash table. If not, insert & print. Else, next. Extra points for asking if the list of strings fits in memory. Medium question: I want to use a binary tree to encode infix arithmetic expressions on integers. Operations are addition and multiplication. 1. Draw a picture of what the tree looks like This is straightforward. 2. Write a class definition Do they get that nodes can either be operators or immediate values? Or, do they have some more complex setup? 3. Write an evaluate() member function 4. How would you make your evaluate() iterative instead of recursive? Hard question: Describe to the candidate how IP route lookups work: You have 32-bit destination addresses and a set of routes that tell you how to forward a packet. Each route matches on the first k bits of the destination address, for variable k. I.e., some routes may care only about the first 4 bits, while you might have a 32-bit route that matches only a single IP address. For a given destination address, you want to find the longest matching route. (Walk through a couple examples to make this clear.) How would you build a data structure to support route look ups? First, check that they give a correct data structure / algorithm. (E.g., make sure they really find the longest matching prefix.) Possible solutions include: Have 32 hash tables. Store routes of length i in hash table i. Start your route lookups by matching into hash table 32 and count down until you find a match. Same as above, but using search trees instead of hash tables Trie (a tree-like structure in which the value are encoded in paths rather than associated with nodes. Left branch means 0, right branch means 1. Value of a node is the sequence of bits (left/right) between the root and the node. Lookup by searching the trie until you reach a leaf node or are unable to search deeper.

If the candidate gets one of the basic answers (32 standard data structures), explain tries and ask for the candidate to figure out how to apply them. If the candidate does really well, ask how to optimize the trie for 2 cases: 1. Very large, dense route table. No leaf nodes in first 8 levels of trie A solution here is to vary the branching factor. In the above case, the root node could have 2^8 children. (The term for this is level compression.) 2. Long branches that encode only a single route Solution here is to terminate the branch early, and store multiple bits in a given node. (Term for this is path compression)

You might also like