"Smartifying" my Hi-Fi system
By Ricardo
This was the topic of a talk in GambiConf 2024! Once the full video is up, I’ll update the links here. Meanwhile, you can check the stream recording, screen recording, the slides or even the code!
Background story
I have a “dumb” “hi-fi system” that I used as speakers for my desktop computer. They are pretty good considering what I paid for them many years ago, and work pretty well. I am not, however, using all of its features, such as USB and FM Radio - and that’s fine. But I wonder if I can do better.
You see, I use Spotify as a streaming service. Whenever I’m working (from home), I’m not using the hi-fi’s speakers, as they are connected to my desktop computer. That can be easily fixed by having some kind of switching (or mixing) device that would allow me to connect the hi-fi to my employer’s laptop. But honestly I don’t care about that: all I want is to play Spotify using those speakers. Not only that, if possible, I want to use prev/next song buttons to change songs on the streaming. Simple as that.
Then I started to wonder: what would happen if I had an USB drive that I would connect to the hi-fi and whatever song it played it would actually play an Internet stream? Would that even be possible? Well, I’m glad to say that yes, it is possible - it’s just messy.
The Challenge
Based on the very basic knowledge of how USB mass storage devices work, I know that whenever a block/sector/cluster/whatever is requested by the host, the device can delay the return of such value. This happens all the time on mechanical drives: it first needs to seek the correct position and then read the data. This takes time. Therefore, the answers are asynchronous, which is important.
My idea: create an USB device that presents itself with a sound file on it. Whenever a player wants to play such files, it would receive different data on every read, as it would actually be returning a stream of data from somewhere else.
The original idea also used multiple sound files, as, based on the positions of each file, I could also detect a navigation, such as going back a song or playing the next one: you just need to detected the host reading the first block of each file. But that is a challenge for another day (although very much possible!).
About not taking the hard way
I have played with V-USB and manual USB device configuration in the past (and again in this project). My first idea was to create a custom device that, once connected, would present itself as the controllable mass storage device. I would then intercept reads and writes to control the replies to the host, giving the proper bytes for the stream it was reading.
This is not easy though. You need to deeply understand the USB protocol (endpoints, transfers, etc) as well as understand how mass storage devices are presented to the OS. You’ll need to translate USB calls to disk calls, then those calls to filesystem positions, and finally those to file positions.
Is it doable? Yes. Do I want to? No. Hell no.
I really enjoy playing with very complicated stuff, such as USB communication. I am, however, on vacation and don’t want to spend 10, 12, 14 hours in front of my computer trying to debug the USB device implementation I wrote. I might do it in the future (to create a smaller version of this project), but it’s not my goal at this moment at all.
Is there an easy way?
Have you heard of the Raspberry Pi Zero’s USB Gadget mode? It’s a pretty cool feature embedded in the RPi’s CPU that allow an USB port to be configurable and behave as a device. This is also called sometimes USB OTG, although I’m not sure this name is correct. This gadget mode in the kernel allows you to make the RPi behaves itself an USB device instead of a host, meaning it can present itself as anything: a serial port, a network card, or even a mass storage device! Cool, right?
This would fit my needs perfectly: a way of making the USB communication transparent (so I don’t have to implement it manually) and I would only have to deal with the filesystem stuff. This would allow me to start and finish this project in a single day!
Ricardo from the future: boy oh boy. I was so wrong on this one. It has been 4 days already :)
As everything else so far, there’s a catch: the gadget mode won’t allow me to fine-tune the data I/O. Not only that, it requires a disk image to work with, and such image is then loaded by the kernel module (g_mass_storage
) and fed into the USB port. Also, both the kernel (AFAIK) and the host will cache data, which is, as you can imagine, bad. I’m assuming my hi-fi won’t cache data (this was later proved to be correct).
Let’s go deeper into this though. The kernel module documentation states that you should not modify this file while loaded as changes might not be detected:
“Beware that if a file is used as a backing storage, it may not be modified by any other process. This is because the host assumes the data does not change without its knowledge. It may be read, but (if the logical unit is writable) due to buffering on the host side, the contents are not well defined.”
I tested this and it’s true. It changes, but once the host OS reads the filesystem, it’s cached, so you are screwed. File data is also cached, so yeah, it’s a no-go. And trust me when I say this: at least with Windows, this is a no-go at all. Probably the hi-fi won’t cache too much stuff, but still.
“But Ricardo, didn’t the kernel docs said this already?” - Hey, just because they say you shouldn’t do it, it doesn’t mean you can’t do it.
So.. is the gadget mode a no-go? Maybe. Let’s keep it on the side for now and focus on a different part of the project: the filesystem. I started to wonder: is there any way of creating a custom filesystem where you can control the the reads and writes?
Blowing some fuses
Have you ever heard of FUSE? FUSE stands for Filesystem in Userspace and is an interface for writing filesystems in Linux (and Unix it seems). Not only that, it works in userspace, meaning that the “driver” you write won’t be loaded into kernel, but as a standard user process. It’s a pretty cool thing.
You know what else is cool? FUSE allows you to control read and write functions: you just get the handle, offset, size/buffer, and you decide what you want to do with it: fail, delay, return, whatever. You are in control of the filesystem. This is perfect for what I want to do.
I actually played with this for hours and was able to reproduce what I want. I created a filesystem that has an MP3 file in it (a fake one). Once you load it into ffplay
, it will “hang” while the filesystem waits for data from a socket. I then piped StarFM Berlin into it and, sure enough, the supposedly “local” MP3 started playing an Internet stream. Stop the pipe and the player hangs. Restart it and there it goes again. Perfect!
There is, however, a catch (get used to this, you’re hear this a lot). I can’t mix this with the g_mass_storage
. This means I can’t create a custom filesystem and simply map it into the module, as the module requires a disk image, not an existing filesystem. I can’t fake a block device or something to even make it work. Damn it, this would be so easy.
The no-so-easy way
Ok, so this is where we stand right now:
- We have a way of creating a controllable filesystem that allows use to return whatever we want when we read a file.
- We have a way of providing a disk image containing a FAT filesystem as an USB mass storage device.
Can we mix those two? I mean, the gadget module for USB mass storage won’t accept a FUSE filesystem directly into it without any modification - plus the FUSE filesystem doesn’t have disk information as it is not a block device. But what if we loaded the disk image inside the FUSE filesystem and hijack the reads for the position of the files we want to control? Would this be possible?
Ironically, yes. I even made a drawing because it’s a mess to understand this.
So, this is what I want to do: whenever the FUSE filesystem gets a read at the exact position of our stream file, it replies something else. Obviously the player will want more data, so we’ll keep it coming while we’re still the bounds of such file. Once out of it, then we’ll just pass through the original data.
To make this work, we need a few things first:
- A way of loading the disk image into our custom filesystem.
- A way of finding the exact position of our stream file and its bounds (as well as handling fragmentation).
You might be wondering: why do we need a real file within the filesystem if we’re not gonna return its data? Well, that’s only to make the FAT table work. If the player itself will handle an empty read as EOF or will it respect the FAT clusters, it’s a whole different story.
The implementation
Beware: crappy coding ahead! :) No, really, the code is not clean at all. Honestly I don’t care (and you shouldn’t as well), as this is just a proof-of-concept! Please do not refer to this code as any kind of guide.
The code is here:
The implementation is essentially what I said before: it’s a custom filesystem using FUSE that will intercept all reads for offsets of the file. It does this by doing the following:
- Read the
storage
file (with an offset to bypass the MBR and other disk info) as a FAT filesystem and figure out all clusters for the “needle file” (the stream one). - Convert those clusters into file offsets for the
storage
file, so that we know the exact position within the real file itself. That means that if reading cluster 1234 will give you offset 349393 in the file, I want to know that position. - Simplify these offsets to make faster to check them. For example, if we have to hijack the offsets 1000 to 2000 and 2000 to 3000, we can simply hijack 1000 to 3000 This is done for performance.
- Start the FUSE filesystem with a single file within: the
storage
one. At this point we also start listening on port3123
for data to write into a thread-safe FIFO buffer we created.
At this point the program is running and the filesystem is mounted. If you list files on it, you’ll see only the storage
file, and this is correct. If you map this into loop device using losetup
and mount it locally, you’ll see the contents of such image. Here is the tricky though:
Any read outside the monitored offsets will seek and return the original file contents. Any read within the monitored offsets will wait and return data from our buffer, which is populated by the socket. This means such read will hang until it has data. Originally I’ve designed this to return a silence file for 100ms, but this was introducing way too much lag.
So… does it work? Yes, but of course! Here’s a video (without audio because of how I recorded it and due to copyright reasons) so you can enjoy it:
Quick small details: you might have noticed I’ve mapped the loop0
with offset 1048576
. That’s the offset of the FAT filesystem in the disk image. Plus, if you look closely, you’ll see I’ve mounted the filesystem with ro,sync
: this is because 1) I don’t want to make any changes on it, and 2) I want it to do direct I/O and not cache stuff. Caching things on this project is bad.
Ok, that’s cool and everything… but how about loading it into g_mass_storage
? Does it work? Does it does the same thing when I load this thing into a Raspberry Pi Zero? Yes, it does.
Here’s a quick demo for you:
Yep, that’s it. It works and, somehow, it works great!
Next steps
You might be wondering: but Ricardo, how about Spotify? Well, it’s not that complicated: you just need to load spotifyd
or other similar project and convert the stream into an MP3 one, so you can patch to the file. The reason why I’m using MP3s here is because it’s the only thing my hi-fi will load. Originally I was planning to use an WAV file for simplicity, but that didn’t work out. Also MP3s can be streamed by concatenating them it seems - although I think it might not be recommended.
My main next steps, in no particular order, would be:
- Find a way to make it smaller. Maybe V-USB? Maybe streaming USB data over the network, some kind of USB/IP mixed with V-USB? Who knows, but making it the size of a flash drive would be amazing.
- Performance is slow, mostly due to the RPi Zero I’m using. Even at 1GHz it sometimes outputs crappy, choppy audio. This is due to the FUSE being a bit too slow in Python. I need to fix that, but trust me, the version I published is way better than the original one!
- Detecting track navigation: if you go to the previous or next tracks, emit a signal to somewhere. This would allow triggering a track change on the stream source, such as Spotify.
But until then… enjoy some StarFM Berlin!