Two Sigma
There are 4 sets of questions (ref: 1point3acres), All the code requires test infra.
Abstract
Normally, there would be 3 rounds interviews. First round has two questions, first question normally is easy and then the next one could be a bit difficult. Then the second round would be one median level problem and a system / OO design question, or only system / OO design. Final round is either difficult problem or a system / OO design.
Afternoon interviews would be behaviour questions mixed up with the coding, system design and brain teaser problems. I believe the goal of the afternoon interviews would be the decision from team manager, who want to know that you can be the good fit to the team, like solving problems together, discussion and etc.
This document references the posts in 1point3acres and the following docs. Thank you guys.
https://gaigai_kris.gitbooks.io/interview-basic/two_sigma.html
https://xiaoguan.gitbooks.io/prepare-for-interview/Two%20Sigma/ts-onsite-2.html
Set 1
Round 1 Calculator Class & Remove Subtree
Question 1: Calculator class (Reverse Polish Notation)
一个Calculator类
包含一个stack和一个vector
一个 Token 类,包含一个process(stack)方法
Operand 和 Operator 继承自Token
Operand的process是向stack中push这个数字
Operator包含一个numOfOperand,process方法是从stack中pop出numOfOperand个数字后进行某种操作,结果再push进stack
Operator (abstract base class)
operate(int val1, int val2)
Add, Subtract, Multiply, Divide四个class继承这个Operator base class的写法
pay attention to Divide
C++ code was copied from somewhere else.
Java code was created based the c++ code, and it works.
Design Pattern
* Factory Pattern
Follow Ups
token -- operand and operator inherit from token
unary and ternary operators -- add a variable of NumOperands
factory design pattern
Factory design pattern is a class where you can hide the creation logic of all sub-classes. Typically, it would require a type and generate an object of sub-class. Essentially, it’s a mapping from types sub-classes.
怎么才能直接给用户binary, 让他们可以自由的添加新的operator
更好的办法是做XML / JSON的serialization
Key Points
Abstract Class
Factory Design Pattern
Question 2: Remove Subtree from Tree
写一个子函数,要现场编译现场跑. 给定一个数组,数组的每个元素就是一个节点(struct node),大概的长得像下面这样
struct node{
int parent;
int val;
bool valid;
};
parent代表当前node的parent 在数组里的index,root node的parent是-1. 所以node 是child指向parent的。
给定一个数组和数组的某个index,删除这个index以及它的子树(只需要将node里的valid置为false即可),只能用O(n)的空间复杂度
C++ was copied from somewhere else
Java was based on the C++, it works.
Follow Ups
Update tree size
Corner cases
Index is out of range
delete subtree which already removed
Node cannot be changed.
use a boolean array to store the visited indexes.
Round 2 Blocking Queues & Slow Web Diagnose
Question 1: Blocking Queues
Suppose you have two independent blocking queues that will keep getting new data, the new data will always be greater than the last element of the queue it is going to. You can only use getNext() to get the data from these two blocking queues and each data can be fetched only once. Write a program to output all pairs of data from these two blocking queues that have difference smaller than 1.
Example:
Q1: {0.2, 1.4, 3.0}
Q2: {1.0, 1.1, 3.5}
output: [0.2, 1.0], [1.4, 1.0], [0.2, 1.1], [1.4, 1.1], [3.0, 3.5].
Java 面试者的可以用java里面自带blocking queue来做,面试官提示说的。
Because the data comes from two blocking queues, the implementation of
non-streaming version would be two queues and call calculatePairs on different
queue order.
However, becasue the blocking queue could be blocked, because there is no new
incoming data. Thus two threads would be the solution.
C++ code is coyied from somewhere else
Java code is based on the C++ code, and it works.
Follow Ups
如果有多个queue,比如10个queue怎么办,
It should use a list to store the data from each blocking queue.
Lock is the same, it should lock before the calculation and release the lock afterwards.
The thread input should be like
(List<List<Double>> queues, int qIndex)
Each thread need to maintain a list of indexes which point to the un-explored position of other queues.
Every time a thread get a new data, start from the current index and find the last index suite the condition of each other queues.
Question 2: Slow Web Accessing

When does it happen? How frequently ?
How slower compare to the normal?
Where does it happen? All places or only several places?
Is there a monitor on the full stack?
Host & service health check
Overall host performance (CPU, Mem, I/O, Disk and etc)
Time of request
Time of back end request
Database response time
Round 3 Wildcard Matching
https://leetcode.com/problems/wildcard-matching/solution/
The solution should start from DP solution. The testing cases should be added at beginning.
Step 1: Recursive Solution
Step 2: DP solution
Step 3: Backtracking
Dp solution and backtrack solution are copied from leetcode
Follow Ups
Better than DP? O(1) space Backtrack
Set 2
Round 1 Power of 4 & Random Iterator Dividable by 5
Question 1: Power of 4
For loop
Bit Mask
The for loop should be trivial.
Bit mask is what interveiwer wanted. It should be explained clearly.
Question 2: Random Iterator Dividable by 5
Using a random number iterator to crate an iterator that only return multiply of 5
implement
next(), hasNext(), remove()
Solution (the question is bit of blur so the solution is based on my understanding)
The class should implement
Iterator<Integer>
add
nextValue
andhasNextValue
to the class.remove the element from the container.
Output numbers only could mod by 5.
Remove some elements from the container
Check again to make sure the elements are removed.
Round 2 Game of Life & Text Editor OO Design
Question 1: Game of Life
1D solution (YYXYY) http://jonmillen.com/1dlife/index.html
dead cell is born if it has 2 or 3 Y-neighbours alive
living cell survives if it has 2 or 4 Y-neighbours
2D solution
follow up
infinite grid
large grid sparse
distributed system design
* 1D and 2D solution is trivial.
o in place solution would be prefered
o need to ask about the boarder
* Follow ups:
o Infinite grid
o Large grid but alive points are sparse
o Solve it using distribued system
Question 2: Text Editor Design
Design a text editor
insert(position)
delete(p1, p2)
highlight(p1, p2)
redo & undo
save/load
update
search
Data structure
Rope
Concat
Split
Highlight can be implemented by add another flag in tree node
Two stacks for Redo and Undo
Implement a class of Actions
Stack of Actions
Save & Load
Serialise & Deserialise tree
Search
Boyer-Moore Search
Rope data structure require two operations
1: concat
simplist solution is to create a new node: node.left = root, node.right = inputNode
Need to balance the tree after each concat
2: split
Update root to the left side and return the right side node.
Then the following operations
* Insert(index, str)
o inputNode = new Node(str);
o lastNode = split(index);
o concat(inputNode);
o concat(lastNode);
* Delete(start, end)
o lastNode = split(end)
o deleteNode = split(start)
o concat(lastNode)
* Index(index)
o binary search
* Highlight(start, end)
o lastNode = split(end)
o highlightNode = split(start)
o highlight(highlightNode)
o concat(highlightNode)
o concat(lastNode)
* Search
o Iteriter inorder traverse.
o Boyer-Moore search algorithm.
Round 3 Debug Guava
Source file link
multimap.
一个key 对应一个collection,
先写test case再debug,
但这里特别提醒大家,写function之前看javadoc,不仅是
开头那一长段
把鼠标放method 名字上出现的那段
Bugs:
put: not implemented
size:
clear
clearall
JUnit
Separate the @Test
Add @Begin
There are source code and the testing base.
1: implement put
2: check the bugs
3: write the tests cases
Set 3
Round 1 LRU + LFU (follow up)
Question 1 LRU
In Java there is an LRU implementation called
LinkedHashMap
It also worth to ask the capacity threshold. (like size = 0.75*capacity)
Follow up: concurrent LRU
First of all, there are lots of article explained the concurrent LRU, it would be good for the background reading
This article explained the different implementations in detail
Here is the source code of the ConcurrentLinkedHashMap
Compare Google Guava and Caffeine
Solution key points
* DummyHead and DummyTail points to the head and tail of the double linked list
* Apply removeNode and addNode.
o removeNode:
. remove the node from the list, connect its left and right,
o addNode:
. Add the node to the head of the double linked list
* put(key, value)
o check the existence of the key, update the value, update the node.
o check size limit
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > CAPACITY;
}
Solution key points
* DummyHead and DummyTail points to the head and tail of the double linked list
* Apply removeNode and addNode.
o removeNode:
. remove the node from the list, connect its left and right,
o addNode:
. Add the node to the head of the double linked list
* put(key, value)
o check the existence of the key, update the value, update the node.
o check size limit
Question 2 LFU
It may no be here but worth to understand the detail of the implementation
Data structure: three maps to keep k-v, k-count, count-list<key> and minCount
* Key-Value map
o Stores the all the key-value
o HashMap<T, T>
* Key-Count map
o Store the count of the key
o HashMap<T, Integer>
* Count-List<Key> map
o Store the list of keys of each count
o HashMap<Integer, LinkedHashSet<T>>
* minCount
o keep track the min count
Round 2 Best time to buy stock & Debug median of two sorted arrays
Question 1 Best time to buy stock
Remember the key point is the state transfer.
There are 5 related problems, during interview, the III and IV would be asked.
A really good article which explained this problem in great detail

Question 2 Debug median of two sorted arrays
Given the code of the median of two sorted arrays
Offered some testing cases, but not enough
Testing case should cover:
* 两个数组都为空
* 有其中一个数组为空,另外一个数组长度为奇数/偶数
* 数组长度都是奇数
* 数组长度都是偶数
* 数组长度一奇一偶
* index over the constraint
* overlap & non overlap
Round 3 ATM OO design
Design an ATM, some of the basic interfaces are already given, need to add more if necessary and then implement the class.
need to implement:
用户注册,用户登陆,
生成新银行卡,验证银行卡,
查询余额,存取钱等一大堆功能
There are
Accounts:
checking
saving
Functions:
withdraw
deposit
transfer
inquiry
setting
class
ATM
User
Accounts
Here is an useful link : http://www.math-cs.gordon.edu/courses/cs211/ATMExample/
Because it is a semi open-end question, there is no real correct answer. The following solution is based on my own understanding and above link.
It should use
State Pattern
Mediator Pattern
// Need to implement the following classes
class AccountStore
class CardHandler
class CashDispenser
class CashIntake
// Then there is a menu function in ATM class. Such as
switch (choice) {
case LOGIN:
//TODO: implement function to support user login
}
// ATM class has the following members
AccountStore _accountstore; // store the account created at this ATM
CardHandler _cardhanlder; // handles card read return
CashDispenser _cashdispenser; // handles cash withdraw
CashIntake _cashintake; // handles cash deposit
Set 4
The questions in this set are varies. I add all questions whose frequency is higher than the others in this set. Also some of the questions from other set would also be asked in this set.
Round 1 Refrigerator
有几个室友住在一间房子,有好的室友,有坏的室友。
好的室友会买东西放在冰箱里,坏的室友会从冰箱里拿东西吃。
冰箱里有不同种类的食物,比如牛奶,香蕉,鸡蛋等等。
然后在冰箱里,每个种类都有一个数量上限,当然也不可以低于0。
他给你一段程序,模拟买和吃的过程,买需要一些时间(sleep),吃也需要一些时间,然后print出实时的数量。
每个print里面会有个validation,如果超出了上线,或者低于0,就会出error message。
你要做的就是fix这段程序,让程序不出现error message。
Solution
The problem is straight forward
We need to add locks on each element. It isn't clear what's the data structure. Here is my assumptions
An array to store the count of each element
An map to store the count of each element
Locks :
ReentrantLock
Semaphore: Signal, like limit the number of the thread
Mutex:
Possible solutions
* It is a producer - consumer model
* Lock
o Some mates said about mutex, semaphore, sychronization and ReeentrantLock
* The code is based on my search and understanding. I left the eggLock to demo
that when there is no lock, it may be error message.
Round 2 news feed system design
内部系统供员工抓取想要的新闻,比如自己facebook账号的更新,或者是报纸新闻上的更新
前段后端的搭建,
页面自动刷新目前最新的topic,
还有一些filter,比如种类,时间,关键字
upcoming calendar event
还需要支持搜索功能
Because currently they are all virtual interviews. This problem may not be asked.
It should use pull and push model.
Pull model for the personal websites, like fackbook or twitter
Push model for the general websites, like wsj or nytimes
* Feed system
* Web crawler
* Search
Round 3 Super stack & String multiply
Question 1 Super stack
支持average, min, max, mode之类。这里的mode是出现次数最多的意思
It looks like mixed lots of small questions into one problem.
mode (LFU): https://leetcode.com/problems/lfu-cache/
Solution
Average: maintain a total sum, average = sum / stack size
Min: use a stack to store min
Max: use a stack to store max
mode: Like LFU cache, but keep the max instead of min.
Median: Two queues and a map to track the removed ones
Here I only implements the average, min, max and mode.
Because mode is quite complex so I create another class to handle it.
It would be better to design the unit test.
Question 2 String multiply
Leetcode 43:
Key points
multiply a string to a digit
Remember to add tailing zeros
add two strings
* multiply a string and a digit
* add two strings
* loop on shorter string and skip zeros
* better to use StreamBuilder
Random Coding Questions
Some random questions
Populate next pointer of a tree: https://leetcode.com/problems/populating-next-right-pointers-in-each-node/
Regex matching: https://leetcode.com/problems/regular-expression-matching/
理论。给两个set,如果取他们的交集,问了各个方法的优缺点(时间复杂度+空间复杂度)
brute force
sort one, binary search the other one
sort, then two pointers
hashmap
merge,归并排序duplicates???
理论。各种sorting algo 的优缺点。
bubble, insertion
bucket sort
shell
merge sort
quick sort
Statistic City Power & Temp
给一个csv文件,存有array of(城市,日期,温度,电用量)。要求知道每个城市每天最高温度和最高电用量。而且需要在知道了结果以后立刻print(比如一个城市到了第二天,就要立刻打印出之前一天的数据),而且需要按照知道答案的顺序print(这里其实有edge case需要考虑不过track一下行数然后建一个queue就好了)。实现很简单,建一个hash table配上一个queue之类的就可以。注意一定要删掉之前存下来的cache,文件特...别....大....,电脑其实没多少内存。面试官有答案,与结果diff。
Weighted Sampling
Weighted sampling。给一个array of tuple的(weight, data),要求以给定的概率sample数据。 最优答案是binary indexd tree log(n)的读写complexity,而且支持insert/delete,都是log(n) 我给出普通答案,记住已有的weight然后查找。面试官也认可了。
Random Design Questions
Design Google Map
主要实现两点之间公共交通路径和所需时间。
主要用到dijkstra's algorithm实现最短步行路径,geohash找最近的k个巴士站。
白板implement dijkstra's。
如果一条巴士线有很多站,则需要用到binary search找到离source最近的巴士站点。
Follow up:
已知所有自行车站到start点和end点的距离,但是自行车站之间的建图代价非常大,如果减低建图的复杂度
Design Watch
这个手表有三种模式(普通,倒计时,秒表计时),只有两个按钮。
得用这两个按钮实现切换模式,暂停,开始。
假设两个按钮分别是A, B。A,B一起按就是 nextMode() 。
在倒计时模式下,A是增加时间,B是暂停/开始。
Solution:
用一个变量储存现在倒计时的时间,一个变量储存现在的状态。然后每次 call pressA() 就+10s,每次 call pressB() 就切换状态。
如果状态是开始,就 setTimeInterval 每一秒钟 call watch.display(现在倒计时还剩多少)
如果状态是暂停就 clear setTimeInterval。
Design Google Doc
设计 Google Doc 这种多人同时编辑的,怎么让每个人看到的都是update的version
Operational transformation Instead of editing the doc directly, send operations to server and let server decide the final/combined state.
Distributed Services Health Check
很多机器分布在各个地方,设计一种通信方式是得它们能知道各自的health status.
经面试官提示,各个机器需要把自己的"heart beat"传给其它机器,怎么传heart beat最简单了。
就是各个机器都自己有一个operation counter.
每次操作operation counter都加1,
然后把这个operation counter要传给其它机器。
其它机器可以根据这个operation counter是不是连续加一来判断是不是漏接收某个机器的信息
IoT data collection & store
是在IoT的环境下,有几百million的sensor采集温度的数据,每个sensor还分city和type,要求设计能够支持两个query
给定时间范围内某一个city或者某一个type的平均温度。
本人转专业的系统设计比较水,大概说了一番,感觉存储方面其实相当于问的是类似google bigquery底层大概怎么实现的,
中间也会follow到很多关于怎么scale的问题
Board and Player
Design a game,
有一个grid, grid上有个target,有一些players.
Players不知道自己在grid上哪里,player有自己的移动strategy
但是输入是告诉你上次的移动是接近target了还是远离target了。
些是SmartPlayer,他会根据输入决定下一步怎么走
有些是RandomPlayer,他们就是随机走
* Game
o Game class should have a board and some player
o it should control when start and end
* Board
o Board should record the players location.
o interfaces:
. registerPlayer(Player)
. int isCloser(Player, Move)
* Player
o Player should only record the result from prev move, for decision
o interfaces:
. int[] nextMove();
. setResult(int)
Stock subscription notification system
It should be like the message queue subscription model.
It feels like the new feed system
Random Brain Teaser Questions
Question 1:
就是一个train在入站台和出站台的时候需要先减速再加速,问你车的哪个部位在整个站台停留的时间最长。
我大概思考了一下乱说了一个head……后来work out math的时候我发现head和tail的时间是相同的,我就开始觉得应该是midway停留的时间最长,(这个时候有一个假设是减速和加速的加速度是相同的)。
面试官说对,然后就问how much longer? 然后给了一个提示用s = 1/2 a t t这个公式去推,我大概写了几行跟他说大概是(sqrt{2} - 1) longer,他有点惊讶因为他好像觉得我很多步骤跳掉了……anyway他觉得没问题,后来又问如果减速和加速的加速度不同那怎么办,哪个点停留的时间最长?我直接画了两下猜了一个表达式大概就说是要在车的(a2 / (a1 + a2))分界的地方……面试官就问为啥?我尝试work out math了半天未果,反正大概就是1/2 a1 t1 t1 = x, 1/2 a2 t2 *t2 = S - x然后去maximize t1 + t2的值,很明显的我的数学水平已然不足以解决这个问题了……
不过面试官说我很惊讶你一开始给的答案就是对的,我就说这个intuitively大概就是这样blah blah,然后他说不用推了我们又开心的聊了一些culture问题结束
Some other useful links
Last updated
Was this helpful?