Advent of Code 2024 Day 6 in Rust

Optimising an Advent of Code solution. The problem description can be found here.
Part 1
Part 1 is straightforward — it is just a matter of implementing what is described in the question.
I'll store the grid as a 2D vector:
type Grid = Vec<Vec<char>>;Then, let's parse the input string into a Grid:
fn parse(input: &str) -> Grid {
input
.trim()
.lines()
.map(|line| line.chars().collect())
.collect()
}We are going to be working with coordinates a lot in this problem, so let's create a Point struct to abstract away some of that.
#[derive(Debug, Copy, Clone, PartialEq, Hash, Eq)]
struct Point {
row: i32,
col: i32,
}A new function is useful to have:
impl Point {
fn new(x: i32, y: i32) -> Self {
Self { row: x, col: y }
}
}We are going to be adding Points together a lot. It would be good if we could use the actual + syntax. Rust supports operator overloading, so all we have to do is implement std::ops::Add.
impl Add for Point {
type Output = Self;
fn add(self, other: Self) -> Self {
Self {
row: self.row + other.row,
col: self.col + other.col,
}
}
}The next thing that needs to be done is to find the starting position of the guard.
fn find_guard(grid: &Grid) -> Point {
for (i, line) in grid.iter().enumerate() {
for (j, c) in line.iter().enumerate() {
if c == &'^' {
return Point::new(i as i32, j as i32);
}
}
}
unreachable!()
}This is what part 1 is looking like at the moment:
pub fn part1(input: &str) -> usize {
let grid = parse(input);
let rows = grid.len() as i32;
let cols = grid[0].len() as i32;
let mut pos = find_guard(&grid);pos is mutable as it will track the position of the guard over time.
We need a concept of direction, so let's define the possible directions, starting upwards (the position of the guard) and going clockwise.
let directions = [
Point::new(-1, 0),
Point::new(0, 1),
Point::new(1, 0),
Point::new(0, -1),
];Let's keep track of the current direction by indexing into directions.
let mut dir_index = 0;Initially, I thought about just having a counter and incrementing it at each step to track the path length. But actually we are asked for the number of unique squares that the guard uses, and it is possible that she crosses her own path, going a different direction. So instead, we must track the squares that have been visited in a HashSet and return the length of it at the end.
let mut visited = HashSet::new();
// ...
visited.len()Now I'll go ahead and paste the rest of my solution
'outer: loop {
visited.insert(pos);
loop {
let new_pos = pos + directions[dir_index % directions.len()];
if new_pos.row < 0 || new_pos.row >= rows || new_pos.col < 0 || new_pos.col >= cols {
break 'outer;
}
if grid[new_pos.row as usize][new_pos.col as usize] != '#' {
pos = new_pos;
break;
}
dir_index += 1;
}
}The first thing is the outer infinite loop. An iteration of this represents a step on the path. A great Rust feature is that you can name a loop and reference it in break statements, and that is exactly what I have done here, so that I can break out of the outer loop from inside the inner one.
The current pos is added to visited.
The inner loop is in charge of incrementing the direction until a valid one is found. In each iteration, it calculates a potential new_pos by adding the current direction to the current position. If that new position is out of bounds, then our path is finished, and we break out of the outer loop. Otherwise, if the new_pos is valid (no obstacles), we update pos and break the inner loop. If it wasn't valid, we turn right by incrementing the direction index and go for another iteration of the inner loop.
And that's the solution for part 1! The full code listing can be found here.
Modifying Part 1
Now, for some reason, I decided that I didn't like the Point idea. Maybe it was performance or just complexity, I don't know, but I can't remember why I removed it. I will actually end up adding it back a bit later.
Anyway, here's what the code looks like once I removed it:
pub fn part1(input: &str) -> usize {
let grid = parse(input);
let rows = grid.len();
let cols = grid[0].len();
let (mut x, mut y) = find_guard(&grid);
let mut di = 0;
let mut visited = HashSet::new();
'outer: loop {
visited.insert((x, y));
loop {
let (dx, dy) = DIRS[di];
let (tx, ty) = (x as isize + dx, y as isize + dy);
if tx < 0 || tx >= rows as isize || ty < 0 || ty >= cols as isize {
break 'outer;
}
if grid[tx as usize][ty as usize] != '#' {
(x, y) = (tx as usize, ty as usize);
break;
}
di += 1;
di %= 4;
}
}
visited.len()
}GitHub link here.
Part 2
Let's do the obvious solution first — brute force. Try and place the extra obstacle in each empty position to check if it makes a loop.
How will we detect a loop? If we ever end up in the same place and going in the same direction, then we are in a loop. So, again, we can use a HashSet, but this time we store the direction as well as the position.
Here's the full code:
pub fn part2(input: &str) -> usize {
let mut grid = parse(input);
let rows = grid.len();
let cols = grid[0].len();
let (gx, gy) = find_guard(&grid);
let mut count = 0;
for ox in 0..rows {
for oy in 0..cols {
if grid[ox][oy] != '.' {
continue;
}
grid[ox][oy] = '#';
let (mut x, mut y) = (gx, gy);
let mut di = 0;
let mut visited = HashSet::new();
'outer: loop {
if !visited.insert((x, y, di)) {
count += 1;
break;
}
loop {
let (dx, dy) = DIRS[di];
let (tx, ty) = (x as isize + dx, y as isize + dy);
if tx < 0 || tx >= rows as isize || ty < 0 || ty >= cols as isize {
break 'outer;
}
if grid[tx as usize][ty as usize] != '#' {
(x, y) = (tx as usize, ty as usize);
break;
}
di += 1;
di %= 4;
}
}
grid[ox][oy] = '.';
}
}
count
}(GitHub link here)
Most of this is the same as part 1 (in fact, I should almost certainly factor out some of this logic, but that can wait). We just iterate over the whole grid. If the cell is empty (.), we place an obstacle there and check if it makes a loop, and we revert the cell at the end.
if !visited.insert((x, y, di)) {
count += 1;
break;
}This part is different from before. insert will return true if its argument was not already contained in the set. So if it is false, we are in a looped path, so we increment count and break the loop.
Does this work? Yes. Is it fast? No.
❯ cargo run year2024 day06
Compiling advent-of-code v0.1.0 (/Users/will-lynas/dev/advent-of-code)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.93s
Running `target/debug/advent-of-code year2024 day06`
year2024 day06
Part 1: 4722
Part 2: 1602
🕐 55028.334 ms55 seconds! We can definitely do better.
(FYI, I'm running this on an M1 Pro MacBook)
Let's quickly try a release build (instead of dev):
❯ cargo run --release -- year2024 day06
Finished `release` profile [optimized] target(s) in 0.08s
Running `target/release/advent-of-code year2024 day06`
year2024 day06
Part 1: 4722
Part 2: 1602
🕐 5133.125 ms5 seconds now. Wow! I was not expecting a 10x speed-up. It would be interesting to investigate which part of the (unoptimised) code is responsible for that. I know that in dev mode, there are a lot of checks on casting that are foregone in release. This code has a bunch of casts (probably a red flag), so maybe it's that?
An easy optimisation
Thinking about it, if we choose a random place to put the extra obstacle, the guard will never hit it. In fact, the only time the extra obstacle makes a difference is when it is placed on the original path that we found in part one. I'll shamelessly copy paste the code from part 1 to find the original path, then only try out those positions.
pub fn part2(input: &str) -> usize {
let mut grid = parse(input);
let rows = grid.len();
let cols = grid[0].len();
let (gx, gy) = find_guard(&grid);
let (mut x, mut y) = (gx, gy);
let mut di = 0;
let mut path = HashSet::new();
'outer: loop {
path.insert((x, y));
loop {
let (dx, dy) = DIRS[di];
let (tx, ty) = (x as isize + dx, y as isize + dy);
if tx < 0 || tx >= rows as isize || ty < 0 || ty >= cols as isize {
break 'outer;
}
if grid[tx as usize][ty as usize] != '#' {
(x, y) = (tx as usize, ty as usize);
break;
}
di += 1;
di %= 4;
}
}
path.remove(&(gx, gy));
let mut count = 0;
for (ox, oy) in path {
grid[ox][oy] = '#';
let (mut x, mut y) = (gx, gy);
let mut di = 0;
let mut visited = HashSet::new();
'outer: loop {
if !visited.insert((x, y, di)) {
count += 1;
break;
}
loop {
let (dx, dy) = DIRS[di];
let (tx, ty) = (x as isize + dx, y as isize + dy);
if tx < 0 || tx >= rows as isize || ty < 0 || ty >= cols as isize {
break 'outer;
}
if grid[tx as usize][ty as usize] != '#' {
(x, y) = (tx as usize, ty as usize);
break;
}
di += 1;
di %= 4;
}
}
grid[ox][oy] = '.';
}
count
}There's nothing much to comment on here. We can get rid of the part that checks if the place we are trying to place the obstacle in is free — the original path doesn't have any obstacles on it. We need to remember to remove the starting position from the path; we can't put an obstacle on top of the guard.
Let's see how fast it is now. I'll use release mode — I've learnt my lesson.
❯ cargo run --release -- year2024 day06
Compiling advent-of-code v0.1.0 (/Users/will-lynas/dev/advent-of-code)
Finished `release` profile [optimized] target(s) in 1.16s
Running `target/release/advent-of-code year2024 day06`
year2024 day06
Part 1: 4722
Part 2: 1602
🕐 1047.750 ms1 second. That's not bad. But my goal is under 10ms, and I think that should be possible. Now that the running time is getting quite short, it's probably worth doing some proper benchmarking, rather than just running it once. I'll look into using cargo bench after I add some threading.
The code for this section is here.
Threading
Notice that the computations for each potential obstacle are independent of each other. That means we can run them in parallel.
As a proof of concept, let's split the positions-to-check into chunks of 10 and run each chunk in its own thread. Then all we have to do is sum the local counts returned from the threads.
let path_chunks: Vec<_> = path
.into_iter()
.collect::<Vec<_>>()
.chunks(10)
.map(|chunk| chunk.to_vec())
.collect();Here we use the chunks method to split path up into chunks of 10.
let handles: Vec<_> = path_chunks
.into_iter()
.map(|chunk| {
let mut grid = grid.clone();
thread::spawn(move || {
let mut local_count = 0;
for (ox, oy) in chunk {
grid[ox][oy] = '#';
let (mut x, mut y) = (gx, gy);
let mut di = 0;
let mut visited = HashSet::new();
'outer: loop {
if !visited.insert((x, y, di)) {
local_count += 1;
break;
}
loop {
let (dx, dy) = DIRS[di];
let (tx, ty) = (x as isize + dx, y as isize + dy);
if tx < 0 || tx >= rows as isize || ty < 0 || ty >= cols as isize {
break 'outer;
}
if grid[tx as usize][ty as usize] != '#' {
(x, y) = (tx as usize, ty as usize);
break;
}
di += 1;
di %= 4;
}
}
grid[ox][oy] = '.';
}
local_count
})
})
.collect();Then for each chunk we spawn a thread, cloning the grid for each one.
handles
.into_iter()
.map(|handle| handle.join().unwrap())
.sum()Finally, we join all the handles and sum their return values.
The code can be found here.
The solution is pretty fast now.
❯ cargo run --release -- year2024 day06
Compiling advent-of-code v0.1.0 (/Users/will-lynas/dev/advent-of-code)
Finished `release` profile [optimized] target(s) in 0.99s
Running `target/release/advent-of-code year2024 day06`
year2024 day06
Part 1: 4722
Part 2: 1602
🕐 173.500 msThis task is CPU bound, so there is no point making more threads than the number of logical cores on the hardware. If we were IO bound, then it would make sense to make many more threads.
On macOS, I can run the following command to get the number of logical cores:
sysctl -n hw.ncpuThe answer is 8.
There is also a function in Rust (available_parallelism from std::thread) which can find this. Using that, we can calculate the correct chunk size:
let path_vec: Vec<_> = path.into_iter().collect();
let threads: usize = available_parallelism().unwrap().into();
let chunk_size = ceiling_division(path_vec.len(), threads);Where ceiling_division is defined as …
fn ceiling_division(a: usize, b: usize) -> usize {
(a + b - 1) / b
}In fact, clippy informs us of a built-in method for ceiling division, so let's use that instead.
let path_vec: Vec<_> = path.into_iter().collect();
let threads: usize = available_parallelism().unwrap().into();
let chunk_size = path_vec.len().div_ceil(threads);The rest of the code remains the same.
How much faster is this?
❯ cargo run --release -- year2024 day06
Compiling advent-of-code v0.1.0 (/Users/will-lynas/dev/advent-of-code)
Finished `release` profile [optimized] target(s) in 1.01s
Running `target/release/advent-of-code year2024 day06`
year2024 day06
Part 1: 4722
Part 2: 1602
🕐 177.958 msIt isn't. Or at least the speed-up is negligible compared to the variance of single runs. It looks like the overhead of spinning up too many threads is small, so cutting down the number of threads doesn't do much. It is definitely time to benchmark this properly.
Benchmarking
The built-in #[bench] attribute is only available on nightly, so I'll upgrade my Advent of Code repo. To do this, I'll create a rust-toolchain.toml file in the root, with the following contents:
[toolchain]
channel = "nightly-2024-11-01"
components = [ "rustfmt", "clippy" ]This sets the toolchain to nightly, and sets rustfmt and clippy to use nightly too.
Now we need to add this to the top of main.rs:
#![feature(test)]
extern crate test;Finally, let's add the actual benchmark code and run it:
#[bench]
fn part2_bench(b: &mut Bencher) {
let path = Path::new("input")
.join("year2024")
.join("day06")
.with_extension("txt");
let input = read_to_string(path).unwrap();
b.iter(|| part2(&input));
}❯ cargo bench part2_bench
Finished `bench` profile [optimized] target(s) in 0.01s
Running unittests src/main.rs (target/release/deps/advent_of_code-1b9bf3f93b46aa4d)
running 1 test
test year2024::day06::test::part2_bench ... bench: 180,060,837.50 ns/iter (+/- 12,198,291.68)
test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 14 filtered out; finished in 54.32s180ms. Ok. Let's quickly benchmark the previous version, which had many more threads.
❯ cargo bench part2_bench
Compiling advent-of-code v0.1.0 (/Users/will-lynas/dev/advent-of-code)
Finished `bench` profile [optimized] target(s) in 0.68s
Running unittests src/main.rs (target/release/deps/advent_of_code-1b9bf3f93b46aa4d)
running 1 test
test year2024::day06::test::part2_bench ... bench: 222,999,204.20 ns/iter (+/- 63,755,465.85)
test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 14 filtered out; finished in 67.52s223ms. So getting the correct number of threads did actually improve performance.
It's weird that the benchmark is slower than the single run version. I thought it might just be because nightly is slower, but it seems to be the same for single runs as it was before. Weird.
Adding back Point
I'm a bit stuck here. I think it's time to look at someone else's code in Rust.
This repo is excellent, and the solutions are very fast. Let's check their benchmark for part 2.
❯ cargo +nightly bench year2024::day06
Finished `bench` profile [optimized] target(s) in 0.00s
Running unittests src/lib.rs (target/release/deps/aoc-1f6196624e9dd1c0)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/release/deps/aoc-360d97610256ab09)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/benchmark.rs (target/release/deps/benchmark-9696f7b96bfe406a)
running 3 tests
test year2024::day06::parse_bench ... bench: 2,683.19 ns/iter (+/- 101.74)
test year2024::day06::part1_bench ... bench: 7,326.22 ns/iter (+/- 308.94)
test year2024::day06::part2_bench ... bench: 488,527.61 ns/iter (+/- 8,174.06)
test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 747 filtered out; finished in 5.19sHalf a millisecond. Wow.
Looking at the code, it looks like they do use a Point struct similar to what I was using. I guess the performance hit isn't bad. I'll put it back in (and at the same time steal some nice helper functions) and benchmark again.
Here's the code after that.
❯ cargo bench day06
Compiling advent-of-code v0.1.0 (/Users/will-lynas/dev/advent-of-code)
Finished `bench` profile [optimized] target(s) in 0.35s
Running unittests src/lib.rs (target/release/deps/advent_of_code-057d7e4886393af8)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running unittests src/main.rs (target/release/deps/advent_of_code-267d9ad5534556cf)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Running benches/benchmarks.rs (target/release/deps/benchmarks-d700b4b68bccb330)
running 2 tests
test year2024::day06::part1 ... bench: 332,971.36 ns/iter (+/- 23,970.91)
test year2024::day06::part2 ... bench: 169,348,824.90 ns/iter (+/- 9,003,324.45)169ms. It actually got faster.
Let's try inlining new and add to see if that makes any difference. I would guess not, since the compiler probably inlines those functions anyway.
running 1 test
test year2024::day06::part2 ... bench: 170,545,791.70 ns/iter (+/- 7,754,877.47)
test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 11 filtered out; finished in 51.36s171ms. Yeah, it doesn't seem to have changed.
At the moment, I have an array of the 4 possible directions, which I index into to get the current direction.
pub const DIRS: [Point; 4] = [UP, RIGHT, DOWN, LEFT];But when I change that to instead store the current direction and mutate it in place using this …
#[inline]
pub fn rotate_clockwise(&mut self) {
(self.x, self.y) = (-self.y, self.x);
}… I somehow get an almost 2x speed-up to 93ms
running 1 test
test year2024::day06::part2 ... bench: 92,767,133.30 ns/iter (+/- 6,106,290.76)Not sure why that is, but I'll take it!
Grid Struct
The other repo also has a struct for storing the grid and associated methods. I'll do something similar.
#[derive(Clone)]
pub struct Grid {
pub body: Vec<Vec<char>>,
}The actual struct just holds the same data as before. We will need clone when we do the threading part.
pub fn parse(input: &str) -> Grid {
Grid {
body: input
.trim()
.lines()
.map(|line| line.chars().collect())
.collect(),
}
}The parse function is the same too.
pub fn contains(&self, point: Point) -> bool {
point.y >= 0
&& point.y < self.body.len() as i32
&& point.x >= 0
&& point.x < self.body[0].len() as i32
}We can factor out the checks for if a Point is within the bounds of the Grid.
pub fn find(&self, goal: &char) -> Option<Point> {
for (i, line) in self.body.iter().enumerate() {
for (j, c) in line.iter().enumerate() {
if c == goal {
return Some(Point::new(j as i32, i as i32));
}
}
}
None
}And here's a slightly more generic version of the find_guard function from before.
One big quality of life feature is the ability to index Grid using a Point. That can be done by implementing two traits, one for mutable and the other for immutable access.
impl Index<Point> for Grid {
type Output = char;
fn index(&self, point: Point) -> &Self::Output {
&self.body[point.y as usize][point.x as usize]
}
}
impl IndexMut<Point> for Grid {
fn index_mut(&mut self, point: Point) -> &mut Self::Output {
&mut self.body[point.y as usize][point.x as usize]
}
}Then we can finally update the code to use this. I'll show part1 only, as part2 is still pretty long, but as always the code can be found here.
pub fn part1(grid: &Grid) -> usize {
let mut pos = grid.find(&'^').unwrap();
let mut dir = UP;
let mut visited = HashSet::new();
'outer: loop {
visited.insert(pos);
loop {
let new_pos = pos + dir;
if !grid.contains(new_pos) {
break 'outer;
}
if grid[new_pos] != '#' {
pos = new_pos;
break;
}
dir.rotate_clockwise();
}
}
visited.len()
}It's looking a lot nicer now. Let's make sure we haven't taken any performance hits:
running 1 test
test year2024::day06::part2 ... bench: 94,096,337.50 ns/iter (+/- 6,532,897.53)Nope. Everything looks good.
Improving Grid
I've chosen to implement Grid using char, but that is actually quite wasteful. All the text here is ASCII so only takes up 1 byte (char is 4 bytes), so we can actually store it as a u8. I'll quickly make Grid generic over its cell value instead — this code will probably be useful in other problems. I'll copy the other repo by only implementing parse for u8 though.
#[derive(Clone)]
pub struct Grid<T> {
pub body: Vec<Vec<T>>,
}
impl Grid<u8> {
pub fn parse(input: &str) -> Self {
Grid {
body: input
.trim()
.lines()
.map(|line| line.bytes().collect())
.collect(),
}
}
}
impl<T> Grid<T> {
pub fn contains(&self, point: Point) -> bool {
point.y >= 0
&& point.y < self.body.len() as i32
&& point.x >= 0
&& point.x < self.body[0].len() as i32
}
}
impl<T: PartialEq> Grid<T> {
pub fn find(&self, goal: &T) -> Option<Point> {
for (i, line) in self.body.iter().enumerate() {
for (j, c) in line.iter().enumerate() {
if c == goal {
return Some(Point::new(j as i32, i as i32));
}
}
}
None
}
}
impl<T> Index<Point> for Grid<T> {
type Output = T;
fn index(&self, point: Point) -> &Self::Output {
&self.body[point.y as usize][point.x as usize]
}
}
impl<T> IndexMut<Point> for Grid<T> {
fn index_mut(&mut self, point: Point) -> &mut Self::Output {
&mut self.body[point.y as usize][point.x as usize]
}
}That's pretty straightforward. Although the impl blocks get a bit more fragmented because of the trait bounds.
How's the performance?
running 1 test
test year2024::day06::part2 ... bench: 95,028,891.70 ns/iter (+/- 6,703,570.32)It didn't really change :/
See the code here.
Improving Grid again
One other thing the other repo does differently, is they store the grid as a single Vec (along with the width and height) as opposed to a 2D Vec. Let's try that.
running 1 test
test year2024::day06::part2 ... bench: 94,732,241.70 ns/iter (+/- 6,200,892.18)No change again.
The code is listed here.
Algorithm Improvements
Again, looking at the other repo, some of my logic can be simplified. Let's look at part 1 first.
pub fn part1(grid: &Grid<u8>) -> usize {
let mut pos = grid.find(&b'^').unwrap();
let mut dir = UP;
let mut visited = HashSet::new();
while grid.contains(pos + dir) {
if grid[pos + dir] == b'#' {
dir.rotate_clockwise();
continue;
}
visited.insert(pos);
pos += dir;
}
visited.insert(pos);
visited.len()
}Nothing much to talk about. I did have to implement the following for Point …
impl AddAssign for Point {
fn add_assign(&mut self, rhs: Self) {
*self = *self + rhs;
}
}Part 2 is similar:
let handles: Vec<_> = path_chunks
.into_iter()
.map(|chunk| {
let mut grid = grid.clone();
thread::spawn(move || {
let mut local_count = 0;
for obstacle_pos in chunk {
grid[obstacle_pos] = b'#';
let mut pos = original_pos;
let mut dir = UP;
let mut visited = HashSet::new();
while grid.contains(pos + dir) {
if grid[pos + dir] == b'#' {
dir.rotate_clockwise();
continue;
}
if !visited.insert((pos, dir)) {
local_count += 1;
break;
}
pos += dir;
}
grid[obstacle_pos] = b'.';
}
local_count
})
})
.collect();running 1 test
test year2024::day06::part2 ... bench: 98,957,466.80 ns/iter (+/- 6,896,216.22)We actually lost a bit of performance, but I think the extra readability is worth it.
Code is here.
Hashing Algorithms
Rust's built in hashing algorithm is a cryptographic hashing function. This comes with a performance cost and provides no benefit to us. So let's switch to a non-cryptographic hashing function, like gxhash.
running 1 test
test year2024::day06::part2 ... bench: 69,413,212.50 ns/iter (+/- 5,707,128.93)69ms! An easy win.
The code is here (although almost nothing has changed).
Conclusions
That's where I'm going to leave this problem. There exist some much faster algorithms for this problem, but I'm pretty satisfied just having optimised this one. 55 seconds to 69ms is quite an improvement!