I don’t know about you, but when I see a problem that needs some basic
numeric operations like min, max, addition and multiplication, I think to
myself: “this is a great opportunity to pull out sed
”.
Oh, just me then?
sed
is a line-oriented text processor: you give it a sequence of commands,
and it applies those commands to each line of the input.
The s
ubstitution command is the most well-known, and many folks will only have
ever used sed 's/regex/replacement/g'
on the command line, opting for other
tools for more complex problems.
But sed
has plenty more functionality to offer!
The problem input
We have a game where we’re pulling marbles from a bag, and we’re given a record of what happened for these games. The input format is:
Game 1: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green
Game 2: 1 blue, 2 green; 3 green, 4 blue, 1 red; 1 green, 1 blue
Game 3: 8 green, 6 blue, 20 red; 5 blue, 4 red, 13 green; 5 green, 1 red
Game 4: 1 green, 3 red, 6 blue; 3 green, 6 red; 3 green, 15 blue, 14 red
Game 5: 6 red, 1 blue, 3 green; 2 blue, 1 red, 2 green
It turns out we don’t need to care about the commas and semicolons: neither of the parts have any kind of dependency on the sequence or conjunction of cubes.
Part 1
To work with numbers in sed
, we’re going to convert our numbers between
numeral systems. The three forms we will have are:
- Decimal: 43 (the coefficients of 4×10¹ + 3×10⁰)
- Tally: BBBBAAA (B = 10¹, A = 10⁰, add together the values)
- Unary: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (count the symbols)
In our input, the largest number is three digits (Game 100), so we can convert from decimal to tally format like so:
# Convert all numbers to tally format.
s/9([0-9]{2})/C8\1/g
s/8([0-9]{2})/C7\1/g
s/7([0-9]{2})/C6\1/g
s/6([0-9]{2})/C5\1/g
s/5([0-9]{2})/C4\1/g
s/4([0-9]{2})/C3\1/g
s/3([0-9]{2})/C2\1/g
s/2([0-9]{2})/C1\1/g
s/1([0-9]{2})/C\1/g
s/9([0-9])/B8\1/g
s/8([0-9])/B7\1/g
s/7([0-9])/B6\1/g
s/6([0-9])/B5\1/g
s/5([0-9])/B4\1/g
s/4([0-9])/B3\1/g
s/3([0-9])/B2\1/g
s/2([0-9])/B1\1/g
s/1([0-9])/B\1/g
s/9/A8/g
s/8/A7/g
s/7/A6/g
s/6/A5/g
s/5/A4/g
s/4/A3/g
s/3/A2/g
s/2/A1/g
s/1/A/g
s/0//g
The first line turns any 9xx decimal number into C8xx, which then gets turned by the second line into CC7xx, and so on, until the coefficient of 10² is entirely in tally format. The next ten lines do the same for the coefficient of 10¹, and the last ten lines for 10⁰.
Our example now looks like:
Game A: AAA blue, AAAA red; A red, AA green, AAAAAA blue; AA green
Game AA: A blue, AA green; AAA green, AAAA blue, A red; A green, A blue
Game AAA: AAAAAAAA green, AAAAAA blue, BB red; AAAAA blue, AAAA red, BAAA green; AAAAA green, A red
Game AAAA: A green, AAA red, AAAAAA blue; AAA green, AAAAAA red; AAA green, BAAAAA blue, BAAAA red
Game AAAAA: AAAAAA red, A blue, AAA green; AA blue, A red, AA green
We can then turn this into unary format with:
# Convert tally format to unary.
s/C/BBBBBBBBBB/g
s/B/AAAAAAAAAA/g
Which turns our example into:
Game A: AAA blue, AAAA red; A red, AA green, AAAAAA blue; AA green
Game AA: A blue, AA green; AAA green, AAAA blue, A red; A green, A blue
Game AAA: AAAAAAAA green, AAAAAA blue, AAAAAAAAAAAAAAAAAAAA red; AAAAA blue, AAAA red, AAAAAAAAAAAAA green; AAAAA green, A red
Game AAAA: A green, AAA red, AAAAAA blue; AAA green, AAAAAA red; AAA green, AAAAAAAAAAAAAAA blue, AAAAAAAAAAAAAA red
Game AAAAA: AAAAAA red, A blue, AAA green; AA blue, A red, AA green
Now we need to discard any game which has more than 12 red cubes, 13 green cubes
or 14 blue cubes. Consider the red cubes: if we see "AAAAAAAAAAAAA red"
(that’s 13 A
s) somewhere in the line, then it means that one of the draws had
at least 13 red cubes. So, we can discard the games with:
# 13+ red cubes -> impossible
s/^.*AAAAAAAAAAAAA red.*$//
# 14+ green cubes -> impossible
s/^.*AAAAAAAAAAAAAA green.*$//
# 15+ blue cubes -> impossible
s/^.*AAAAAAAAAAAAAAA blue.*$//
Typically, you’d discard lines using the d
command (e.g. /AAAAAAAAAAAAA red/d
),
but I couldn’t figure out how to make that work with the next part, so we’re just
emptying the line.
Our example is now:
Game A: AAA blue, AAAA red; A red, AA green, AAAAAA blue; AA green
Game AA: A blue, AA green; AAA green, AAAA blue, A red; A green, A blue
Game AAAAA: AAAAAA red, A blue, AAA green; AA blue, A red, AA green
We next need only the game ids, so we filter out everything else:
# Get just the ids.
s/^Game (A+):.*$/\1/
Which gives us:
A
AA
AAAAA
And then we need to merge all the lines together to add our numbers, which obviously is just:
# Merge the lines together.
H
$!d
x
s/\n//g
Obviously? Not in the slightest, so let’s break this down.
Remember how sed
processes text line-by-line?
In order to do multi-line operations, we need to make use of the hold buffer.
The H
command appends the current line to the hold buffer, which is great,
we can now accumulate all our lines into one place.
But we only want to do anything with what we put into the hold buffer once we
reach the ends, so what $!d
says is “if it’s not (!
) the last line ($
),
then d
elete the line (and go back to the start of the program to process
the next line)”.
On the last line1 however, we get to continue with x
, which (to skip some
detail) says that our hold buffer (of all the lines we accumulated) is now
the line we’re processing. This “line” actually contains multiple lines with
newline characters between, which the s/\n//g
removes to give us a single
line.
Anyhow, this mess does what we need, and we get:
AAAAAAAA
We then just need to turn this back into a decimal number, which we do by first turning it into the tally format:
# Convert the tally back to a number.
# We have 100 input lines, so at worst we can have 1 + 2 + ... + 100 = 5050
s/AAAAAAAAAA/B/g
s/BBBBBBBBBB/C/g
s/CCCCCCCCCC/D/g
And then decimal:
# Insert zeros.
s/D([^CD]|$)/D0\1/
s/C([^BC]|$)/C0\1/
s/B([^AB]|$)/B0\1/
# Convert back to digits.
s/([ABCD])(\1{8})/9/g
s/([ABCD])(\1{7})/8/g
s/([ABCD])(\1{6})/7/g
s/([ABCD])(\1{5})/6/g
s/([ABCD])(\1{4})/5/g
s/([ABCD])(\1{3})/4/g
s/([ABCD])(\1{2})/3/g
s/([ABCD])(\1{1})/2/g
s/[ABCD]/1/g
Part 2
For the second part, we don’t care about the game ids, and the commas and semicolons are going to get in the way, so we start by removing them:
# Remove unnecessary parts.
s/^.*://
s/[,;]//g
Which turns our example into:
3 blue 4 red 1 red 2 green 6 blue 2 green
1 blue 2 green 3 green 4 blue 1 red 1 green 1 blue
8 green 6 blue 20 red 5 blue 4 red 13 green 5 green 1 red
1 green 3 red 6 blue 3 green 6 red 3 green 15 blue 14 red
6 red 1 blue 3 green 2 blue 1 red 2 green
We then convert our numbers to unary format as before (though this time we only need to do two digits), and then turn the colours into a single-letter suffix on our numbers:
# (decimal->tally->unary from above)
# Turn the colour into a suffix.
s/ red/r/g
s/ green/g/g
s/ blue/b/g
Our example is now:
AAAb AAAAr Ar AAg AAAAAAb AAg
Ab AAg AAAg AAAAb Ar Ag Ab
AAAAAAAAg AAAAAAb AAAAAAAAAAAAAAAAAAAAr AAAAAb AAAAr AAAAAAAAAAAAAg AAAAAg Ar
Ag AAAr AAAAAAb AAAg AAAAAAr AAAg AAAAAAAAAAAAAAAb AAAAAAAAAAAAAAr
AAAAAAr Ab AAAg AAb Ar AAg
In order to find the minimum number of cubes for any game, we need to find the
maximum of each colour. We can do this by repeatedly removing any number where
there is a larger number in the same line (e.g. in the first line, we should
remove AAAb
because there is also AAAAAAb
which is bigger).
The implementation:
# Find the minimum number of cubes of each colour required.
:min1
s/ (A+[rgb])(.*)\1/\2\1/g
tmin1
:min2
s/(A+[rgb])(.*) \1/\1\2/g
tmin2
There are two new features shown here: :min1
defines a label named min1
,
and tmin1
says to jump to min1
if an s
command has succeeded since the
last input line was read or a conditional branch (which includes t
) was taken.
Combined, this just says “keep running the s
command in between until it
doesn’t do anything”.
So what does the s
command do? It matches a space, then a number with a
colour-suffix (which is the first capture group \1
), then anything (the
second capture group \2
), then a repetition of the first capture group.
Consider the first line of our example, where we can find this match:
(AAAb)( AAAAr Ar AAg AAA)AAAb
. This finds an example where the first
capture group is a number which is less than or equal to a number (of
the same colour) appearing later in the line.
This gets substituted with \2\1
, which would be
( AAAAr Ar AAg AAA)(AAAb)
(without the parentheses), which removes the
left number.
Unsurprisingly, min2
does the same thing, but for the cases where the
larger number appears earlier in the line.
Once these have both run to completion, we’ll be left with one number per
colour, though in no particular order.
Here’s what the result looks like:
AAAAr AAAAAAb AAg
AAAg AAAAb Ar
AAAAAAb AAAAAAAAAAAAAAAAAAAAr AAAAAAAAAAAAAg
AAAg AAAAAAAAAAAAAAAb AAAAAAAAAAAAAAr
AAAAAAr AAAg AAb
We don’t need the order though, we just need to multiply them together, which we’ll do with:
# Discard colour suffixes.
s/[rgb]//g
# Multiply the first two numbers together.
s/^/=/
:mul1
s/= (A+) AA/\1= \1 A/
tmul1
s/(A*)= (A+) A / \1\2 /
# Multiply the third number in.
s/^/=/
:mul2
s/= (A+) AA/\1= \1 A/
tmul2
s/(A*)= (A+) A$/\1\2/
Consider the first line of our example, " AAAAr AAAAAAb AAg"
. We start by
discarding the colour suffixes and adding a =
at the start of the line,
giving "= AAAA AAAAAA AA"
.
The mul1
loop then performs naïve multiplication of AAAA
and AAAAAA
,
by adding AAAA
(4) to a total AAAAAA
(6) times.
Here’s each substitution (parentheses indicate capture groups, square
brackets indicate the entire match):
[= (AAAA) AA]AAAA AA
→[AAAA= AAAA A]AAAA AA
AAAA[= AAAA AA]AAA AA
→AAAA[AAAA= AAAA A]AAA AA
AAAAAAAA[= AAAA AA]AA AA
→AAAAAAAA[AAAA= AAAA A]AA AA
AAAAAAAAAAAA[= AAAA AA]A AA
→AAAAAAAAAAAA[AAAA= AAAA A]A AA
AAAAAAAAAAAAAAAA[= AAAA AA] AA
→AAAAAAAAAAAAAAAA[AAAA= AAAA A] AA
And then post-loop, we make the substitution:
[(AAAAAAAAAAAAAAAAAAAA)= AAAA A ]AA
→[AAAAAAAAAAAAAAAAAAAAAAAA] AA
Which leaves us with 24×2 as the multiplication for mul2
, which operates
almost exactly the same.
Once we’ve done that, it’s just a matter of merging the lines together to add all the numbers, then converting back to the tally format and then back to decimal. Simple.
As always, the code is in my repository.
-
This is why I couldn’t use the
d
elete command earlier: if the last line of the input was an impossible game, we’d delete it and the program would terminate without getting to the processing of the hold buffer. Perhaps there’s a nicer solution? I’d love to know. ↩