Leetcode in Scala3 (Dotty): Array Partition I

## Imperative Solution

An imperative solution for Array Partition I in Scala is as follows:

object Solution
def arrayPairSum(nums: Array[Int]): Int =
val rangeSize = 10_000
val numsSorted =
nums.foldLeft(Vector.fill(2 * rangeSize + 1)(0)) { (vec, num) =>
val idx = num + rangeSize
vec.updated(idx, vec(idx) + 1)
}

var res = 0
var isMinInPair = true
var idx = -rangeSize
while idx <= rangeSize do
var i = 0
while i < numsSorted(idx + rangeSize) do
if isMinInPair then
res += idx
isMinInPair = !isMinInPair
i += 1
idx += 1
res

## JMH Benchmark

I measure the performance with JMH on the 2,4 GHz 8-Core Intel Core i9 CPU:

@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Array(Mode.Throughput))
class SolutionBenchmark

def randomArray: Array[Int] =
val n = 1000
var arr = Array.range(0, n)
Random.shuffle(arr)
arr

def randomVector: Vector[Int] =
val n = 1000
var vec = Vector.range(0, n)
Random.shuffle(vec)
vec

@Benchmark
def solution1(blackhole: Blackhole): Unit =
val r = main.Solution1.arrayPairSum(randomArray)
blackhole.consume(r)

@Benchmark
def solution2(blackhole: Blackhole): Unit =
val r = main.Solution2.arrayPairSum(randomArray)
blackhole.consume(r)

@Benchmark
def solution3(blackhole: Blackhole): Unit =
val r = main.Solution3.arrayPairSum(randomArray)
blackhole.consume(r)

@Benchmark
def solution4(blackhole: Blackhole): Unit =
val r = main.Solution4.arrayPairSum(randomVector)
blackhole.consume(r)

@Benchmark
def solution5(blackhole: Blackhole): Unit =
val r = main.Solution5.arrayPairSum(randomVector)
blackhole.consume(r)

@Benchmark
def solution6(blackhole: Blackhole): Unit =
val r = main.Solution6.arrayPairSum(randomArray)
blackhole.consume(r)

The JMH result for the baseline imperative solution:

4.797 ±(99.9%) 0.325 ops/ms [Average]
(min, avg, max) = (4.727, 4.797, 4.929), stdev = 0.085
CI (99.9%): [4.472, 5.123] (assumes normal distribution)

## Functional Solutions in Scala

A solution that uses for comprehensions instead of while loops:

object Solution2:
def arrayPairSum(nums: Array[Int]): Int =
val rangeSize = 10_000
val numsSorted =
nums.foldLeft(Vector.fill(2 * rangeSize + 1)(0)) { (vec, num) =>
val idx = num + rangeSize
vec.updated(idx, vec(idx) + 1)
}

var isMinPair = true
var res = 0
for idx <- -rangeSize to rangeSize
i <- 0 until numsSorted(idx + rangeSize) do
if isMinPair then
res += idx
isMinPair = !isMinPair
res

The JMH result:

3.655 ±(99.9%) 0.292 ops/ms [Average]
(min, avg, max) = (3.543, 3.655, 3.750), stdev = 0.076
CI (99.9%): [3.363, 3.947] (assumes normal distribution)

Scala code is ~30 ops/ms slower than Java code.

## Functional Solutions in Scala

I decompose while loops to tailrec-optimised recursive calls:

object Solution2:
def isValid(v: Vector[Int], i: Int): Int =
@tailrec
def check(j: Int = 1): Int =
if (j >= 4)
-1
else if (i + j < v.size && v(i) == v(i + j))
v(i)
else
check(j + 1)
check()

@tailrec
def traverse(v: Vector[Int], i: Int = 0): Int =
if (i >= v.size)
throw IllegalArgumentException("a repeated element must exist")
isValid(v, i) match
case -1 => traverse(v, i + 1)
case x  => x

def repeatedNTimes(v: Vector[Int]): Int =
traverse(v)

The result JMH:

373.917 ±(99.9%) 15.016 ops/ms [Average]
(min, avg, max) = (370.203, 373.917, 379.355), stdev = 3.900
CI (99.9%): [358.900, 388.933] (assumes normal distribution)

Functions calls drop performance for ~40 ops/ms.

I return -1 from check because task description implies A[i] >= 0. In general check should return Option[Int]. Let’s measure the cost of wrapping result to Option[Int]:

object Solution3:
def isValid(v: Vector[Int], i: Int): Option[Int] =
@tailrec
def check(j: Int = 1): Option[Int] =
if (j >= 4)
None
else if (i + j < v.size && v(i + j) == v(i))
Some(v(i))
else
check(j + 1)
check()

@tailrec
def traverse(v: Vector[Int], i: Int = 0): Int =
if (i >= v.size)
throw IllegalArgumentException("a repeated element must exist")
isValid(v, i) match
case None    => traverse(v, i + 1)
case Some(x) => x

def repeatedNTimes(v: Vector[Int]): Int =
traverse(v)

The JMH result:

358.069 ±(99.9%) 5.131 ops/ms [Average]
(min, avg, max) = (356.584, 358.069, 360.197), stdev = 1.333
CI (99.9%): [352.938, 363.200] (assumes normal distribution)

Return Option as a result type costs ~5 ops/ms.

Let’s go further and avoid range check with Vector.lift:

object Solution4:
def isValid(v: Vector[Int], i: Int): Option[Int] =
@tailrec
def check(j: Int = 1): Option[Int] =
if (j >= 4)
None
else if (v.lift(i + j) == Some(v(i)))
Some(v(i))
else
check(j + 1)
check()

@tailrec
def traverse(v: Vector[Int], i: Int = 0): Int =
if (i >= v.size)
throw IllegalArgumentException("a repeated element must exist")
isValid(v, i) match
case None    => traverse(v, i + 1)
case Some(x) => x

def repeatedNTimes(v: Vector[Int]): Int =
traverse(v)

The JMH result:

280.777 ±(99.9%) 33.193 ops/ms [Average]
(min, avg, max) = (267.076, 280.777, 288.710), stdev = 8.620
CI (99.9%): [247.584, 313.970] (assumes normal distribution)

Introducing Vector.lift as a result type costs ~20 ops/ms.

Let’s complete the samples by the most functional and, alas, most ineffective solution that generates sliding windows:

object Solution5:
def repeatedNTimes(v: Vector[Int]): Int =
val slides = v.sliding(4) ++ (2 to 3).map(v.takeRight(_))
val resOpt = slides.find(s => s.tail.contains(s.head))
resOpt match
case Some(r +: _)   => r
case Some(_) | None => throw IllegalArgumentException("a repeated element must exist")

The overall JMH result is poor:

232.184 ±(99.9%) 27.322 ops/ms [Average]
(min, avg, max) = (221.541, 232.184, 241.193), stdev = 7.095
CI (99.9%): [204.862, 259.505] (assumes normal distribution)

## Bonus: Solution in Kotlin

package main

class Solution0_kotlin {
companion object {
@JvmStatic
fun repeatedNTimes(a: IntArray): Int {
for (i in 0..a.size - 1) {
for (j in 1..3) {
if (i + j >= a.size)
continue
if (a[i] == a[i + j])
return a[i]
}
}
throw IllegalArgumentException()
}
}
}

The JMH result:

897.820 ±(99.9%) 112.651 ops/ms [Average]
(min, avg, max) = (857.173, 897.820, 938.625), stdev = 29.255
CI (99.9%): [785.169, 1010.472] (assumes normal distribution)

## Conclusion

Let’s compare the performance according to JMH normal distribution estimations using Python code as follows:

import numpy as np

def foo(m1: float, s1: float, m2: float, s2: float, q: float=0.1, N: int=1_000_000):
arr1 = np.random.normal(m1, s1, N)
arr2 = np.random.normal(m2, s2, N)
arr1 = np.sort(arr1)[(int)(N*q/100.):-(int)(N*q/100.)]
arr2 = np.sort(arr2)[(int)(N*q/100.):-(int)(N*q/100.)]
indexes = zip(
np.random.randint(low=0, high=arr1.size, size=arr1.size),
np.random.randint(low=0, high=arr2.size, size=arr2.size),
)
arr = [(arr1[a2]-arr2[a1])/arr1[a1] for a1, a2 in indexes]
arr = np.sort(arr)[(int)(N*q/100.):-(int)(N*q/100.)]
return arr[0], arr[-1]

# Imperative Scala
foo(m1=884.907, s1=3.979, m2=861.698, s2=7.321, q=0.20)
> (0.0006373412546876295, 0.052451737914490114)
# Scala, tailrec
foo(m1=884.907, s1=3.979, m2=373.917, s2=3.900)
> (0.5530158708621706, 0.6024221862131598)
# Scala, Option[Int]
foo(m1=884.907, s1=3.979, m2=358.069, s2=1.333)
> (0.5771592930614815, 0.613940360993139)
# Scala, vector.lift
foo(m1=884.907, s1=3.979, m2=280.777, s2=8.620)
> (0.6432971995266645, 0.7227139948686705)
# Scala, sliding window
foo(m1=884.907, s1=3.979, m2=232.184, s2=7.095)
> (0.7022175816054018, 0.7736842716482819)
# Kotlin
foo(m1=884.907, s1=3.979, m2=897.820, s2=29.255)
> (-0.10892560898996115, 0.08285290004447453)
Performance, ops/ms Diff to Java solution
Java solution mean = 884.907, stdev = 3.979 0%
Kotlin solution mean = 897.820, stdev = 29.255 ~0%
Imperative Scala mean = 861.698, stdev = 7.321 ~0%
Scala, tailrec mean = 373.917, stdev = 3.900 -55..61%
Scala, Option[Int] mean = 358.069, stdev = 1.333 -57..61%
Scala, vector.lift mean = 280.777, stdev = 8.620 -64..72%
Scala, sliding window mean = 232.184, stdev = 7.095 -70..77%

Summarizing the results:

• Scala while-loops implementation has the same performance as the Kotlin implementation, but still slower than Java implementation
• tailrec recursive calls cost ~20% of performance
• Vector.lift and comparing with Some costs ~10% of performance
• returning Option as a result costs ~3% of performance
• generating helper vectors with sliding window is a bad idea and costs ~80% of performance