Two Sigma

There are 4 sets of questions (ref: 1point3acres), All the code requires test infra.

Abstract

Normally, there would be 3 rounds interviews. First round has two questions, first question normally is easy and then the next one could be a bit difficult. Then the second round would be one median level problem and a system / OO design question, or only system / OO design. Final round is either difficult problem or a system / OO design.

Afternoon interviews would be behaviour questions mixed up with the coding, system design and brain teaser problems. I believe the goal of the afternoon interviews would be the decision from team manager, who want to know that you can be the good fit to the team, like solving problems together, discussion and etc.

This document references the posts in 1point3acres and the following docs. Thank you guys.

https://gaigai_kris.gitbooks.io/interview-basic/two_sigma.html

https://xiaoguan.gitbooks.io/prepare-for-interview/Two%20Sigma/ts-onsite-2.html

Set 1

Round 1 Calculator Class & Remove Subtree

Question 1: Calculator class (Reverse Polish Notation)

  • 一个Calculator类

    • 包含一个stack和一个vector

    • 一个 Token 类,包含一个process(stack)方法

    • Operand 和 Operator 继承自Token

    • Operand的process是向stack中push这个数字

    • Operator包含一个numOfOperand,process方法是从stack中pop出numOfOperand个数字后进行某种操作,结果再push进stack

  • Operator (abstract base class)

    • operate(int val1, int val2)

  • Add, Subtract, Multiply, Divide四个class继承这个Operator base class的写法

    • pay attention to Divide

C++ code was copied from somewhere else. 
Java code was created based the c++ code, and it works. 

Design Pattern 

* Factory Pattern
  1. Follow Ups

    1. token -- operand and operator inherit from token

    2. unary and ternary operators -- add a variable of NumOperands

    3. factory design pattern

      1. Factory design pattern is a class where you can hide the creation logic of all sub-classes. Typically, it would require a type and generate an object of sub-class. Essentially, it’s a mapping from types sub-classes.

    4. 怎么才能直接给用户binary, 让他们可以自由的添加新的operator

      1. 更好的办法是做XML / JSON的serialization

  2. Key Points

    1. Abstract Class

    2. Factory Design Pattern

Question 2: Remove Subtree from Tree

  • 写一个子函数,要现场编译现场跑. 给定一个数组,数组的每个元素就是一个节点(struct node),大概的长得像下面这样

struct node{
    int parent;
    int val;
    bool valid;
};
  • parent代表当前node的parent 在数组里的index,root node的parent是-1. 所以node 是child指向parent的。

  • 给定一个数组和数组的某个index,删除这个index以及它的子树(只需要将node里的valid置为false即可),只能用O(n)的空间复杂度

C++ was copied from somewhere else
Java was based on the C++, it works.
  1. Follow Ups

    1. Update tree size

    2. Corner cases

      1. Index is out of range

      2. delete subtree which already removed

    3. Node cannot be changed.

      1. use a boolean array to store the visited indexes.

Round 2 Blocking Queues & Slow Web Diagnose

Question 1: Blocking Queues

Because the data comes from two blocking queues, the implementation of 
non-streaming version would be two queues and call calculatePairs on different 
queue order. 

However, becasue the blocking queue could be blocked, because there is no new 
incoming data. Thus two threads would be the solution. 

C++ code is coyied from somewhere else
Java code is based on the C++ code, and it works.
  1. Follow Ups

    1. 如果有多个queue,比如10个queue怎么办,

      1. It should use a list to store the data from each blocking queue.

      2. Lock is the same, it should lock before the calculation and release the lock afterwards.

      3. The thread input should be like (List<List<Double>> queues, int qIndex)

      4. Each thread need to maintain a list of indexes which point to the un-explored position of other queues.

      5. Every time a thread get a new data, start from the current index and find the last index suite the condition of each other queues.

Question 2: Slow Web Accessing

  1. When does it happen? How frequently ?

  2. How slower compare to the normal?

  3. Where does it happen? All places or only several places?

  4. Is there a monitor on the full stack?

    1. Host & service health check

    2. Overall host performance (CPU, Mem, I/O, Disk and etc)

    3. Time of request

    4. Time of back end request

    5. Database response time

Round 3 Wildcard Matching

https://leetcode.com/problems/wildcard-matching/solution/

The solution should start from DP solution. The testing cases should be added at beginning.

  • Step 1: Recursive Solution

  • Step 2: DP solution

  • Step 3: Backtracking

Dp solution and backtrack solution are copied from leetcode
  1. Follow Ups

    1. Better than DP? O(1) space Backtrack

Set 2

Round 1 Power of 4 & Random Iterator Dividable by 5

Question 1: Power of 4

The for loop should be trivial.
Bit mask is what interveiwer wanted. It should be explained clearly.

Question 2: Random Iterator Dividable by 5

  • Using a random number iterator to crate an iterator that only return multiply of 5

  • implement next(), hasNext(), remove()

  • Solution (the question is bit of blur so the solution is based on my understanding)

    • The class should implement Iterator<Integer>

    • add nextValue and hasNextValue to the class.

    • remove the element from the container.

Output numbers only could mod by 5.
Remove some elements from the container
Check again to make sure the elements are removed.

Round 2 Game of Life & Text Editor OO Design

Question 1: Game of Life

* 1D and 2D solution is trivial. 
    o in place solution would be prefered
    o need to ask about the boarder
* Follow ups:
    o Infinite grid
    o Large grid but alive points are sparse
    o Solve it using distribued system

Question 2: Text Editor Design

  • Design a text editor

    • insert(position)

    • delete(p1, p2)

    • highlight(p1, p2)

    • redo & undo

    • save/load

    • update

    • search

  • Data structure

    • Rope

      • Concat

      • Split

    • Highlight can be implemented by add another flag in tree node

    • Two stacks for Redo and Undo

      • Implement a class of Actions

      • Stack of Actions

    • Save & Load

      • Serialise & Deserialise tree

    • Search

      • Boyer-Moore Search

Rope data structure require two operations
1: concat
simplist solution is to create a new node: node.left = root, node.right = inputNode
Need to balance the tree after each concat
2: split
Update root to the left side and return the right side node.

Then the following operations 
* Insert(index, str)
    o inputNode = new Node(str);
    o lastNode = split(index);
    o concat(inputNode);
    o concat(lastNode);
* Delete(start, end)
    o lastNode = split(end)
    o deleteNode = split(start)
    o concat(lastNode)
* Index(index)
    o binary search
* Highlight(start, end)
    o lastNode = split(end)
    o highlightNode = split(start)
    o highlight(highlightNode)
    o concat(highlightNode)
    o concat(lastNode)
* Search
    o Iteriter inorder traverse.
    o Boyer-Moore search algorithm.

Round 3 Debug Guava

  • Source file link

  • multimap.

    • 一个key 对应一个collection,

    • 先写test case再debug,

    • 但这里特别提醒大家,写function之前看javadoc,不仅是

      • 开头那一长段

      • 把鼠标放method 名字上出现的那段

    • Bugs:

      • put: not implemented

      • size:

      • clear

      • clearall

  • JUnit

    • Separate the @Test

    • Add @Begin

There are source code and the testing base.
1: implement put
2: check the bugs
3: write the tests cases

Set 3

Round 1 LRU + LFU (follow up)

Question 1 LRU

Solution key points
* DummyHead and DummyTail points to the head and tail of the double linked list
* Apply removeNode and addNode. 
    o removeNode: 
        . remove the node from the list, connect its left and right,
    o addNode:
        . Add the node to the head of the double linked list
* put(key, value)
    o check the existence of the key, update the value, update the node.
    o check size limit

protected boolean removeEldestEntry(Map.Entry eldest) {
                    return size() > CAPACITY;
                }
Solution key points
* DummyHead and DummyTail points to the head and tail of the double linked list
* Apply removeNode and addNode. 
    o removeNode: 
        . remove the node from the list, connect its left and right,
    o addNode:
        . Add the node to the head of the double linked list
* put(key, value)
    o check the existence of the key, update the value, update the node.
    o check size limit

Question 2 LFU

Data structure: three maps to keep k-v, k-count, count-list<key> and minCount
* Key-Value map
    o Stores the all the key-value
    o HashMap<T, T>
* Key-Count map
    o Store the count of the key
    o HashMap<T, Integer>
* Count-List<Key> map
    o Store the list of keys of each count
    o HashMap<Integer, LinkedHashSet<T>>
* minCount
    o keep track the min count

Round 2 Best time to buy stock & Debug median of two sorted arrays

Question 1 Best time to buy stock

Question 2 Debug median of two sorted arrays

Testing case should cover: 
* 两个数组都为空
* 有其中一个数组为空,另外一个数组长度为奇数/偶数
* 数组长度都是奇数 
* 数组长度都是偶数
* 数组长度一奇一偶
* index over the constraint
* overlap & non overlap

Round 3 ATM OO design

  • Design an ATM, some of the basic interfaces are already given, need to add more if necessary and then implement the class.

  • need to implement:

    • 用户注册,用户登陆,

    • 生成新银行卡,验证银行卡,

    • 查询余额,存取钱等一大堆功能

  • There are

    • Accounts:

      • checking

      • saving

    • Functions:

      • withdraw

      • deposit

      • transfer

      • inquiry

      • setting

    • class

      • ATM

      • User

      • Accounts

  • Because it is a semi open-end question, there is no real correct answer. The following solution is based on my own understanding and above link.

  • It should use

    • State Pattern

    • Mediator Pattern

// Need to implement the following classes

class AccountStore
class CardHandler
class CashDispenser
class CashIntake

// Then there is a menu function in ATM class. Such as
switch (choice) {
    case LOGIN:
        //TODO: implement function to support user login
}

// ATM class has the following members

AccountStore _accountstore;   // store the account created at this ATM
CardHandler _cardhanlder;     // handles card read return
CashDispenser _cashdispenser; // handles cash withdraw
CashIntake _cashintake;       // handles cash deposit

Set 4

The questions in this set are varies. I add all questions whose frequency is higher than the others in this set. Also some of the questions from other set would also be asked in this set.

Round 1 Refrigerator

  • 有几个室友住在一间房子,有好的室友,有坏的室友。

  • 好的室友会买东西放在冰箱里,坏的室友会从冰箱里拿东西吃。

  • 冰箱里有不同种类的食物,比如牛奶,香蕉,鸡蛋等等。

  • 然后在冰箱里,每个种类都有一个数量上限,当然也不可以低于0。

  • 他给你一段程序,模拟买和吃的过程,买需要一些时间(sleep),吃也需要一些时间,然后print出实时的数量。

  • 每个print里面会有个validation,如果超出了上线,或者低于0,就会出error message。

  • 你要做的就是fix这段程序,让程序不出现error message。

  • Solution

    • The problem is straight forward

    • We need to add locks on each element. It isn't clear what's the data structure. Here is my assumptions

      • An array to store the count of each element

      • An map to store the count of each element

    • Locks :

      • ReentrantLock

      • Semaphore: Signal, like limit the number of the thread

      • Mutex:

Possible solutions

* It is a producer - consumer model
* Lock
    o Some mates said about mutex, semaphore, sychronization and ReeentrantLock

* The code is based on my search and understanding. I left the eggLock to demo
  that when there is no lock, it may be error message.

Round 2 news feed system design

  • 内部系统供员工抓取想要的新闻,比如自己facebook账号的更新,或者是报纸新闻上的更新

  • 前段后端的搭建,

    • 页面自动刷新目前最新的topic,

    • 还有一些filter,比如种类,时间,关键字

    • upcoming calendar event

    • 还需要支持搜索功能

Because currently they are all virtual interviews. This problem may not be asked.

It should use pull and push model.

Pull model for the personal websites, like fackbook or twitter
Push model for the general websites, like wsj or nytimes

* Feed system
* Web crawler
* Search

Round 3 Super stack & String multiply

Question 1 Super stack

Here I only implements the average, min, max and mode.
Because mode is quite complex so I create another class to handle it.
It would be better to design the unit test.

Question 2 String multiply

  • Leetcode 43:

  • Key points

    • multiply a string to a digit

      • Remember to add tailing zeros

    • add two strings

* multiply a string and a digit
* add two strings
* loop on shorter string and skip zeros
* better to use StreamBuilder

Random Coding Questions

Some random questions

Statistic City Power & Temp

给一个csv文件,存有array of(城市,日期,温度,电用量)。要求知道每个城市每天最高温度和最高电用量。而且需要在知道了结果以后立刻print(比如一个城市到了第二天,就要立刻打印出之前一天的数据),而且需要按照知道答案的顺序print(这里其实有edge case需要考虑不过track一下行数然后建一个queue就好了)。实现很简单,建一个hash table配上一个queue之类的就可以。注意一定要删掉之前存下来的cache,文件特...别....大....,电脑其实没多少内存。面试官有答案,与结果diff。

Weighted Sampling

Weighted sampling。给一个array of tuple的(weight, data),要求以给定的概率sample数据。 最优答案是binary indexd tree log(n)的读写complexity,而且支持insert/delete,都是log(n) 我给出普通答案,记住已有的weight然后查找。面试官也认可了。

Random Design Questions

Design Google Map

  • 主要实现两点之间公共交通路径和所需时间。

  • 主要用到dijkstra's algorithm实现最短步行路径,geohash找最近的k个巴士站。

  • 白板implement dijkstra's。

  • 如果一条巴士线有很多站,则需要用到binary search找到离source最近的巴士站点。

  • Follow up:

    • 已知所有自行车站到start点和end点的距离,但是自行车站之间的建图代价非常大,如果减低建图的复杂度

Design Watch

  • 这个手表有三种模式(普通,倒计时,秒表计时),只有两个按钮。

  • 得用这两个按钮实现切换模式,暂停,开始。

    • 假设两个按钮分别是A, B。A,B一起按就是 nextMode() 。

    • 在倒计时模式下,A是增加时间,B是暂停/开始。

  • Solution:

    • 用一个变量储存现在倒计时的时间,一个变量储存现在的状态。然后每次 call pressA() 就+10s,每次 call pressB() 就切换状态。

    • 如果状态是开始,就 setTimeInterval 每一秒钟 call watch.display(现在倒计时还剩多少)

    • 如果状态是暂停就 clear setTimeInterval。

Design Google Doc

Distributed Services Health Check

  • 很多机器分布在各个地方,设计一种通信方式是得它们能知道各自的health status.

  • 经面试官提示,各个机器需要把自己的"heart beat"传给其它机器,怎么传heart beat最简单了。

    • 就是各个机器都自己有一个operation counter.

    • 每次操作operation counter都加1,

    • 然后把这个operation counter要传给其它机器。

    • 其它机器可以根据这个operation counter是不是连续加一来判断是不是漏接收某个机器的信息

IoT data collection & store

  • 是在IoT的环境下,有几百million的sensor采集温度的数据,每个sensor还分city和type,要求设计能够支持两个query

    • 给定时间范围内某一个city或者某一个type的平均温度。

  • 本人转专业的系统设计比较水,大概说了一番,感觉存储方面其实相当于问的是类似google bigquery底层大概怎么实现的,

  • 中间也会follow到很多关于怎么scale的问题

Board and Player

  • Design a game,

  • 有一个grid, grid上有个target,有一些players.

  • Players不知道自己在grid上哪里,player有自己的移动strategy

  • 但是输入是告诉你上次的移动是接近target了还是远离target了。

    • 些是SmartPlayer,他会根据输入决定下一步怎么走

    • 有些是RandomPlayer,他们就是随机走

* Game
    o Game class should have a board and some player
    o it should control when start and end
* Board
    o Board should record the players location.
    o interfaces:
        . registerPlayer(Player)
        . int isCloser(Player, Move)
* Player
    o Player should only record the result from prev move, for decision
    o interfaces:
        . int[] nextMove();
        . setResult(int)

Stock subscription notification system

  • It should be like the message queue subscription model.

  • It feels like the new feed system

Random Brain Teaser Questions

Question 1:

  • 就是一个train在入站台和出站台的时候需要先减速再加速,问你车的哪个部位在整个站台停留的时间最长。

    • 我大概思考了一下乱说了一个head……后来work out math的时候我发现head和tail的时间是相同的,我就开始觉得应该是midway停留的时间最长,(这个时候有一个假设是减速和加速的加速度是相同的)。

    • 面试官说对,然后就问how much longer? 然后给了一个提示用s = 1/2 a t t这个公式去推,我大概写了几行跟他说大概是(sqrt{2} - 1) longer,他有点惊讶因为他好像觉得我很多步骤跳掉了……anyway他觉得没问题,后来又问如果减速和加速的加速度不同那怎么办,哪个点停留的时间最长?我直接画了两下猜了一个表达式大概就说是要在车的(a2 / (a1 + a2))分界的地方……面试官就问为啥?我尝试work out math了半天未果,反正大概就是1/2 a1 t1 t1 = x, 1/2 a2 t2 *t2 = S - x然后去maximize t1 + t2的值,很明显的我的数学水平已然不足以解决这个问题了……

    • 不过面试官说我很惊讶你一开始给的答案就是对的,我就说这个intuitively大概就是这样blah blah,然后他说不用推了我们又开心的聊了一些culture问题结束

Last updated

Was this helpful?