Find Median from Data Stream

The problem description can be found in LeetCode.

Approach

Suppose to have an input like:

lst = [1, 2, 3, 4, 5]

If we divide such inputs in small and big values, we would have:

small_values = [1, 2, 3?] big_values = [3?, 4, 5]

The median can be either in the small_values list or in the big_values one. It will either be the last or first element of small_values and big_values, respectively.

So, if we maintain such list of values, we could easily find the median in constant time in one of the two lists.

The Heap looks like a perfect data structure we can use in this case. We just need to maintain the balance properties between the two lists, hence, abs(len(small_values) - len(big_values)) <= 1.

Solution

from heapq import heappop, heappush class MedianFinder: def __init__(self): self.small = [] # max heap self.large = [] # min heap def addNum(self, num: int) -> None: # add the value in the `max Heap` heappush(self.small, -num) # rearrange the values to maintain the heaps property if self.small and self.large: largest = -self.small[0] smallest = self.large[0] if largest > smallest: _ = heappop(self.small) heappush(self.large, largest) self._balance_heaps() def findMedian(self) -> float: assert abs(len(self.small) - len(self.large)) <= 1 if len(self.small) > len(self.large): return -self.small[0] if len(self.large) > len(self.small): return self.large[0] return (-self.small[0] + self.large[0]) / 2 def _balance_heaps(self) -> None: if len(self.small) > len(self.large) + 1: largest = -heappop(self.small) heappush(self.large, largest) elif len(self.large) > len(self.small) + 1: smallest = heappop(self.large) heappush(self.small, -smallest)

Complexity

  • addNum:
    • Time:
    • Space:
  • findMedian:
    • Time:
    • Space:

Where is the number of function calls to the addNum method.