15 Dec 2017

[Blog] MistServer's internals in detail

Hey everyone ๐Ÿ‘‹! It's Jaron again, here to explain the internals of MistServer some more ๐Ÿ”งโš™. Last time I explained DTSC, our internal media format, and this time I will talk about how MistServer is split up over multiple executables and how exactly DTSC (and other means) are used to communicate between these parts. Also, bonus: what you can do by manually running the various MistServer executables.

Extremely modular

When we first designed MistServer, one of our main goals was an extreme crash-resistance and general resilience against any type of problem or attack. To accomplish this, each "active" (active here meaning that it is either being received or sent) stream is maintained by a single "input" process (think of an input as an origin or source for the data). Each connection is maintained by a single "output" process (think of these as sinks for the data). Finally, there is the "controller" process that monitors and controls all of the above, and provides the MistServer API and single point of control.

This very fragmented setup, where each task is handled by not just a separate thread but a complete separate process, was chosen because a crash in a single thread can still affect the other threads in that same process. However, a crash in a process will almost never affect the stability of another process.

Processes and threads are almost the same thing in Linux anyway, and the kernel is very good about re-using static memory allocations, so the (extra) overhead is negligible. There is a little bit of extra overhead making this a bad design for a generic network-based server, but with media specifically the amount of data tends to be large enough that the extra stability is most definitely worth the slightly higher resource usage per connection. In most cases the bandwidth is saturated before the other resources are used up, either way.

The flow of control is rather loosely defined on purpose: "inputs" are started as-needed, directly by the "output" that wants to receive media data. The outputs are spawned from listening processes that wait for connections on sockets, and these listeners are started by the controller. Each output as well as live inputs report statistics and health data to the controller.

The media data itself is made available in DTSC format through shared memory pages, which can freely be read by all processes of the same system user. Metadata on the media data is provided through a custom binary structure, which is written to only by the input and read from by the outputs. This structure is locked only while being written to (roughly once per second), and the many readers do not have to lock the structure to read simultaneously. A similar method is used to report back to the controller.

On a side note: we're actually working on making this entire process fully lock-free, through a special "Reliable Access" shared memory structure we've devised. This structure accomplishes simultaneous writes and reads safely, without needing to lock anything. We're hoping to release this significant system-wide speed boost in early 2018. More on this in a future blog post!

Error recovery

Now, this becomes especially interesting when any of MistServer's processes crash or fail in some other way. Since MistServer was written with dependability in mind, you may never have experienced this. So let me walk you through what happens if something goes wrong:

Should any of the outputs crash or fail, the single connection it was maintaining will be severed. Nobody else will be affected, and the controller knows that and when the process has crashed because it will stop to report back at regular intervals. If the process is frozen or stuck in some kind of loop, the controller will forcibly kill the process to ensure the stability of the rest of the system. There is no data to clean up, since all data used by outputs is in shared structures maintained by the corresponding input.

Should any of the inputs crash or fail, each input has a dedicated "angel process" watching over it that will take notice. Since inputs maintain the shared memory structure, there is a potential to leak a lot of memory should these processes suddenly disappear without doing proper cleanup. The angel process will clean up all memory left behind by the input, and then re-start the input. The outputs never even notice the input has stopped and restarted, and will just take slightly longer to load while the input is recovered. No connections are severed at all in this case (unless the stream in question was a live stream; since the timing information is lost during cleanup).

Should the controller itself crash, this too has an angel process watching over it for the same reason that the inputs do. The controller maintains several structures that contain state information, as well as the structures that the inputs and outputs use to report back. These structures are all known beforehand, which lets us do a neat trick: instead of cleaning up the structures, the newly started replacement controller loads its state information from the existing structures. This allows the controller to literally pick up where the previous one left off, without any of the inputs or outputs even noticing what happened.

Especially cool is that the above behaviour also allows for rolling updates. The MistServer binaries can be replaced by new versions, and the controller told to restart itself. Any new connections will use the newly installed binaries while old connections keep using the old ones. The same holds true for the input processes. Eventually, the whole server will be updated, without ever dropping a single connection in the process.

Playing with the binaries more directly

Because of the modular nature of MistServer, nothing is stopping you from running some inputs or outputs manually alongside what is automatically run. Here are some useful examples:

Making an output write to file. Some of MistServer's outputs are able to write directly to a file (the same formats that we support for recording). For example, you can run MistOutFLV to write any stream to a FLV file as follows: MistOutFLV -s STREAMNAME OUTPUT_FILE.flv. This will write the stream STREAMNAME to the newly created file OUTPUT_FILE.flv. Normally MistServer requires recordings to specify the full output path, but when running the output manually this is not a requirement.

Piping an output into another application. Some of MistServer's outputs are able to write directly to a pipe. For example, you can run MistOutHTTPTS to write any stream to a pipe in TS format as follows: MistOutHTTPTS -s STREAMNAME -. This will write to stream STREAMNAME to stdout. A wide variety of applications is able to process TS over a pipe; so many in fact that we included a special "ts-exec:" output which allows you to do exactly this, with full support for scheduling as well as auto-start when streams become active.

Piping another application into an input. Some of MistServer's inputs are able to read directly from a pipe. For example, you can run MistInTS to read any stream from a pipe in TS format as follows: MistInTS -s STREAMNAME -. This will read stdin into stream STREAMNAME. A wide variety of applications is able to output TS over a pipe; so many in fact that we included a special "ts-exec:" input which allows you to do exactly this, with full support for auto-start and auto-stop as viewers come and go.

Debugging and finding errors/mistakes. Is a specific output or input giving you trouble, but the logs in MistServer's controller are not accurate enough or contain too many unrelated messages from other processes? Just run the relevant input or output manually at a higher debug level, and the messages will print to your console instead of being collected by the controller.

Keeping an input active while testing. In MistServer, some inputs can be set to "Always on" in the configuration, which keeps them permanently on, even without active viewers. Sometimes you want to force this behavior for testing purposes. Running an input manually for an unconfigured stream will trigger this behaviour until manually shut down using a standard kill signal (i.e. Ctrl+C in the terminal).

Checking outputs (MistServer or other sources) for problems. MistServer comes with a collection of "Analysers" which allow you to debug and pretty-print various protocols and file formats. These are especially useful for developers. Explaining these in detail is beyond the scope of this blog post (more on these analysers in a future post!), but you can run them all with the --help commandline parameter to get an overview of supported options and modes.

Querying live stream health. The MistAnalyserDTSC analyser can be used (at least under Linux) to find out more about a live buffer's health in JSON format. To request this data, simply run it as follows: MistAnalyserDTSC -D 1 /dev/shm/MstSTRMstreamname (replace streamname with the name of your stream).

In closing

Keep in mind all of this information is correct and current for the current stable version of MistServer (2.13) as well as most of the older versions. In the near future we'll be updating the internal formats (See "On a side note" above in the section "Extremely modular"), which will make some of this information obsolete. There will be another blog post when that happens!

This post just gave a very high-level overview of Mist's internals and how the various applications connect together to form the complete software package. Particularly the analysers deserve more attention.

I hope you have a better idea of MistServer's inner workings now. Any questions? As always feel free to contact us! See you next time! The next blog post ๐Ÿ’ฌ will be by Balder on how to stream using MistServer and Wirecast.