Find Median from Data Stream
The problem description can be found in LeetCode.
Approach
Suppose to have an input like:
lst = [1, 2, 3, 4, 5]
If we divide such inputs in small and big values, we would have:
small_values = [1, 2, 3?]
big_values = [3?, 4, 5]
The median can be either in the small_values
list or in the big_values
one.
It will either be the last or first element of small_values
and big_values
, respectively.
So, if we maintain such list of values, we could easily find the median in constant time in one of the two lists.
The Heap looks like a perfect data structure we can use in this case.
We just need to maintain the balance properties between the two lists,
hence, abs(len(small_values) - len(big_values)) <= 1
.
Solution
from heapq import heappop, heappush
class MedianFinder:
def __init__(self):
self.small = [] # max heap
self.large = [] # min heap
def addNum(self, num: int) -> None:
# add the value in the `max Heap`
heappush(self.small, -num)
# rearrange the values to maintain the heaps property
if self.small and self.large:
largest = -self.small[0]
smallest = self.large[0]
if largest > smallest:
_ = heappop(self.small)
heappush(self.large, largest)
self._balance_heaps()
def findMedian(self) -> float:
assert abs(len(self.small) - len(self.large)) <= 1
if len(self.small) > len(self.large):
return -self.small[0]
if len(self.large) > len(self.small):
return self.large[0]
return (-self.small[0] + self.large[0]) / 2
def _balance_heaps(self) -> None:
if len(self.small) > len(self.large) + 1:
largest = -heappop(self.small)
heappush(self.large, largest)
elif len(self.large) > len(self.small) + 1:
smallest = heappop(self.large)
heappush(self.small, -smallest)
Complexity
- addNum:
- Time:
- Space:
- findMedian:
- Time:
- Space:
Where is the number of function calls to the addNum
method.