Synchronized Music on micro:bits

It’s been a while since I last posted. So, uh, sorry about that. I’m going to make it up to y’all with this post. It’s in two parts: the video demonstrates the project in action, and the rest of this blog post details all the technical details. I’d recommend starting with the video to see the cool demo, and if you make it to the end, read the rest of the blog post.

Introduction

For many years now, I’ve tutored at the National Computer Science School, which teaches computer science and programming to Australian high school kids. The school is divided into two parts: a web stream, which builds a website using Python, and an embedded stream, which (nowadays) uses MicroPython on the micro:bit.

Last year I tutored the web stream, and so it was only during the closing ceremony while watching videos demonstrating all the embedded students’ projects that I started wondering about the potential of the micro:bit, and whether I could build a mesh network so they could play music synchronized across a large area. So when work started getting quieter over the Christmas / New Year break, and this year’s NCSS started approaching, I decided to buy a couple of micro:bits and start hacking.

Breaking it down

I broke down the project into a number of distinct components:

• network protocol: for sending structured data as micro:bit radio packets.
• time sync: getting all micro:bits on the network to share a common time.
• triggers: getting all micro:bits to do an action at a given time.
• music: making the micro:bits play notes on a piezo buzzer.

Writing micro:bits all the time is too hard, so I’m going to refer to them from here on as nodes.

Time Sync

Time is hard. It’s a little easier in this application, because we don’t care what the actual wall-clock time is, just that all the nodes have the same time.

The first step in figuring out how to synchronize the nodes was to head off to Wikipedia and do some reading on the way I already knew computers synchronized time, Network Time Protocol (NTP). After very little reading, it seemed fairly obvious that NTP was far too complicated for this application: I didn’t need to deal with leap seconds, and polling multiple servers and doing filtering on the answers seemed like too much effort.

Fortunately it took little effort to find the Precision Time Protocol (PTP). PTP is a really nice simple protocol, though the Wikipedia page does have much more detail than is really needed. It’s designed to work on multiple network segments (hey, that sounds kinda like the network topology we have), and on each segment, there’s a master which all the other nodes synchronize to. Each node periodically pings (called Delay_Req and Delay_Resp, but I like ping better) the master to approximate their Round Trip Time (RTT), and the master periodically sends out a Sync message with its current time. Add in some maths, and hey presto, you have synchronized time.

In order to adapt the algorithm for a mesh network, there were a few changes I needed to make. PTP has a best master clock algorithm for selecting the master, but since my nodes might drop in and out of network connectivity all the time, my system needed to continuously adapt which nodes were sending timestamps to synchronize to, based on which nodes could talk to them. I decided to term those nodes thought leaders (partially because I really disliked the "master" terminology, partially because I like being stupid 😛).

Network Protocol

All nodes in my network are assigned an id and level. Given the scale of my network, both of these are unsigned bytes (i.e. 0–255). The id is chosen randomly at startup. Astute readers will note this falls victim to the Birthday Problem, so after 20 nodes there’s a 50% chance two nodes will share an id. This turns out not to actually be a problem in practice: the presence of duplicate ids on the network is only an issue if another node can see both of them (cough and the nodes are sufficiently unreliable that they restart occasionally anyway cough).

Levels: One node is designated as the "root" (by pressing buttons on the micro:bit), and is assigned level 0. All other nodes start with level 31 (chosen because it can be easily displayed in binary on the micro:bit LEDs). The level is an estimate of how close the node is to the root, and nodes will only synchronize with nodes closer to the root than it. As nodes do so, they decrease their own level (but never more than one more than the level of the node it’s synchronizing with). If the node hasn’t managed to synchronize for a few seconds, its level increases by one, ensuring that nodes always choose good nodes to synchronize with.

Here’s an example network:

The level 1 and 4 nodes can both synchronize from the level 0 node (since it’s less than 1 and 4). The level 4 doesn’t synchronize from the level 2 node in this example, just because it can’t see it over the radio network (but it would if it could). As the level 4 node becomes more in sync, it will reduce to a level 3 node, then a level 2 node, and finally a level 1 node—at which point it would stop synchronizing from the other level 1 node.

Pings: All nodes periodically send PING_REQUEST messages on the network, which contain:

• req_node: the id of the node requesting the ping
• req_level: the node’s level
• ping_id: a unique (16 bit) id for this ping request
• votes: a list of node ids this node would like to sync from.

The votes list is just the ids of nodes which the node has seen PING_REQUEST messages from, where req_level was less than its own: this may be out-of-date, but if so, the node can just ignore the SYNC message later (and it won't send that node id next time).

Any node which sees a PING_REQUEST message from another node responds back with a PING_RESPONSE message, which contains:

• req_node: copied from the PING_REQUEST
• resp_node: the id of the node responding to the ping
• resp_level: the node’s level
• ping_id: copied from the PING_REQUEST
• req_end_timestamp: the time at which the PING_REQUEST was received, according to the responding node’s clock

Syncs: Periodically, nodes use the votes it has seen from other nodes to see if it’s the best node to send a SYNC message. If it is, it does, which contains:

• node: the node sending the SYNC
• level: the node’s level
• timestamp: the node’s timestamp at time of sending
• triggers: a list of trigger ids and timestamps (sent as deltas relative to timestamp)

When a node receives a PING from a node with a lower level, it uses the same formula from PTP to adjust its own clock. Assuming:

• $$T_1$$ is the time the node sent a PING_REQUEST to the node sending a SYNC
• $$T'_1$$ is the req_end_timestamp from the corresponding PING_RESPONSE
• $$T_2$$ is the timestamp in the SYNC message
• $$T'_2$$ is the time the SYNC message was received

Then the offset $$õ$$ between the two nodes’ clocks is:

\begin{equation*} õ = \frac{T'_1 - T_1 - T'_2 + T_2}{2} \end{equation*}

In practice, I use exponential smoothing on this process, which reduces the impact of outliers. Earlier I mentioned that nodes reduce their level as they become "more in sync": nodes actually reduce their level every time they do an offset adjustment where the adjustment is small enough ($$|õ| < \varepsilon$$, where $$\varepsilon =$$ 10ms).

Side note: Originally, I intended to use MicroPython on the micro:bit to implement this project. However, after just trying to implement something to decode the network protocol messages into in-memory data structures, I ran out of memory using MicroPython. I briefly toyed with the idea of having some code to generate the MicroPython code (so I could write sensible high-level code, and have it spit out stupid code), but that was far too much effort, so I switched to using the micro:bit C++ API.

Simulation

There’s a lot of detail in the previous section, without much explanation of why I designed it that way. This section isn’t going to help much, but it will explain the process a little better. When I started, I realised that attempting to develop the time synchronization protocol on the actual micro:bits would be very slow and tedious, so I started by writing a Python simulation.

The simulation let me quickly iterate on the design of the protocol, testing various number of nodes with different network connection topologies, and meant that when the network didn’t converge to a common time, I could debug the algorithm by adding things into my Python code, rather than messing around with micro:bit serial cables.

Unfortunately, I didn’t save all the diagrams along the way, but here’s the last one I generated, which shows 13 nodes all converging at different times to the same time (within an accuracy of 20ms).

The 0.1ms line in the diagram is a lie: it should be 0, but that doesn’t work so well with the logarithmic y-axis. The coloured dots on the lines indicate the level of the node at that point in time. Well. That’s what my notes claim, at least.

Triggers

So far, we’ve managed to get a mesh network of nodes synchronized to the same time, but we haven’t managed to get them to do anything useful yet. This is where triggers come in.

A trigger is just an event with an id (saying what you’d like to happen), and a timestamp in the future (when you’d like it to happen). Every SYNC message contains a list of triggers: the root node can schedule triggers, and every node in the network propagates on that list (while the triggers’ timestamps are still in the future) to all the other nodes.

In order to get triggers into the network, the root node listens on the serial port for a six-character hex number: the first two characters are a trigger id, and the remaining four are in how many milliseconds from now to schedule the trigger (for example, 2a0fa0 schedules trigger #42 to happen in 4 seconds).

Just before a trigger is scheduled to happen, the node checks if it’s reasonably in sync (the last adjustment $$õ < \varepsilon$$): if not, it skips the trigger (on the assumption that not playing some music is better than playing out of time).

Music

When a trigger fires, I want to play a segment (roughly five seconds long) of music. This means that if the node gets out of sync, it’ll only be silent for five seconds. Also, if the time synchronization is off by a noticable amount, it’ll only sound weird for five seconds.

Since I want the micro:bits to play music, I need some way to input music into the program. I started with Lilypond, an excellent music engraving program that I've used before (it’s kinda like LaTeX for music). As an example for this section, I’ve written Beethoven’s Ode to Joy in Lilypond:

\version "2.18.2"

\include "articulate.ly"

music = \new Staff \relative c'' {
\tempo 4=160
\time 4/4
\key c \major
e4 e f g |
g f e d |
c c d e |
e4. d8 d2 |
\bar "|."
}

\score {
\articulate \music
\midi {}
}

\score {
\music
\layout {}
}


This Lilypond file results in this PDF output:

It also generates a MIDI file (in case you’re wondering: the \articulate command just makes the MIDI output sound a lot nicer), which I can conveniently read using in Python using the Mido library, and thus generate some C++ arrays representing the data. These look like this:

typedef struct {
uint16_t period_us;
uint16_t duration_ms;
} music_event_t;

typedef struct {
size_t start_event;
size_t n_events;
} music_segment_t;

typedef struct {
const music_event_t* events;
size_t n_events;
const music_segment_t* segments;
size_t n_segments;
}

const music_event_t _song_events[] = {
{.period_us = 1517, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1517, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1432, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1276, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1276, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1432, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1517, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1703, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1911, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1911, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1703, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1517, .duration_ms = 328},
{.period_us = 0, .duration_ms = 47},
{.period_us = 1517, .duration_ms = 492},
{.period_us = 0, .duration_ms = 70},
{.period_us = 1703, .duration_ms = 164},
{.period_us = 0, .duration_ms = 23},
{.period_us = 1703, .duration_ms = 656},
};

const music_segment_t _song_segments[] = {
{.start_event = 0, .n_events = 26},
{.start_event = 26, .n_events = 3},
};

const music_data_t song_data = {
.events = _song_events,
.n_events = sizeof(_song_events) /
sizeof(_song_events[0]),
.segments = _song_segments,
.n_segments = sizeof(_song_segments) /
sizeof(_song_segments[0]),
};


The resulting song_data has two segments, which index into the events array. The intention there was that if music has repeating sections, the segments could reference the same events, saving program space: as it turned out, I had plenty of program space and finding repeating sections is hard, so this was a bit of a premature optimization.

Then all that remained was to hook up trigger n to play segment n of the music, using PWM on the piezo buzzer I’d attached to pin 0 of my micro:bits.

Reliability

Having written all this code, I found that on the micro:bits, the program would occasionally get stuck: either the CPU would freeze up entirely, or the radio module would get into a bad state and not receive / transmit messages, or other weirdness would happen.

Instead of trying to fix these, I used the watchdog timer of the nRF51 (the CPU which runs the program): this lets me configure up to 8 "reload requests", and a period. If my application code doesn’t trigger all the enabled reload requests within the period, the CPU will reset itself.

I configured the period to two seconds, and set up two reload requests: one in the main loop that I use for display (as a "the program has just stopped" check), and one inside the time sync logic, which triggers the reload request iff the time has synced (again, $$|õ| < \varepsilon$$) within the last 30 seconds. This second check is really useful: if the network gets into a weird state where none of the nodes can sync, the nodes will all be obviously displaying this by resetting themselves every 30 seconds.

The other huge reliability concern was the root node in the network: if that dies, all the other nodes lose their sync. As a way to reduce this impact, I made it so any node could become the root node (by holding down some buttons): that way, if I noticed the root node die, I could promote a standby node, and since it already had almost the same time, the other nodes wouldn’t lose their synchronization.

Epilogue

So while this project was really cool and fun to build, I didn’t really have a use for it until NCSS started. We run a programming competition at the camp, which this year was Hitchhiker’s Guide to the Galaxy themed. I did the theming for the scoreboard, and had a little intro animation (the words “Don’t Panic” zooming in over a background of starts), and had the micro:bits play Journey of a Sorcerer (the theme music for HG2G) during that.

I think it went reasonably well, though the big unforeseen issue was that the piezo buzzers on most of the micro:bits weren’t as loud as the ones I had been testing with, so the effect was somewhat, uh, muted. 😛

If I was to do more on this project, there’s a few things I would improve:

• The time sync algorithm doesn’t actually need the SYNC packets: it could be done as part of the PING_RESPONSE packets, which would simplify it somewhat.
• Send the music data across the network: compiling and flashing all the nodes every time I want to play a different song is very tedious.
• Relatedly, move the song data to flash storage: that way the music data from the network would be persisted across reboots.
• Use something other than piezo buzzers. I was watching the Wintergatan marble machine (well, actually the videos for the construction of its replacement), and it’s made me really want to do something where the micro:bit triggers physical actions to make noise (maybe plucking a violin string)?
• Support multiple channels of MIDI. There’s actually very little stopping the micro:bits from playing polyphonic sounds: if I had compiled the nodes with different song data (but matching segment lengths), they could already do it pretty easily.

Source code: is in this Mercurial repository. It’s open source, under the Apache 2.0 license.