Multicore Usage

Hello,

I was wondering how the MOD devices handle the distribution of the audio modules through multiple CPU cores. Is this handled automatically by the ALSA/JACK framework or is there some smart distribution logic on a higher layer?

The audio chain being largely a sequential process, it still can contain parallel branches that reach a common end node. I imagine there should be a way to optimize the distribution based on a sort of “dependency graph” of the audio plugins connection.

If we take a simple example of a single channel audio chain with multiple plugins in series, running in the MOD Duo (dual core):

Input_1 -> PlugIn_0 -> PlugIn_1 -> ... -> PlugIn_N -> Output_1

is it possible that this chain is split in 2, the first half part processed in CPU0 and the second part in CPU1? If so, is it correct to assume that the latency will roughly double, due to serial buffering needed in both CPUs? More generically: is the same latency guaranteed regardless of the number of cores?

Cheers,

/edwillys

1 Like

We are using JACK2, which has built-in SMP for its audio graph.
Each audio plugin is its own JACK client, created by mod-host. Plugin audio, MIDI and CV ports are created as actual JACK ports.

We could, in theory, split the audio graph to take more advantage of SMP in case of many plugins in a series, but that would double the latency yes.
One lesson learned about trying to optimize for multi-core, is that the easiest option is usually the best one too.

The cores that are not fully active for audio processing are not really useless, since there is a lot of other stuff to do in the system, including reading/writing Control Chain, MIDI and CV and handling the webserver.

The current system has a fixed latency. Only DSP/CPU load changes.
There are plugins that add latency, but that is the exception rather than the rule. Most plugins get data and process that data without any added latency. Everything is contained within 1 audio block cycle.

4 Likes

Thank you for the detailed reply.

That is of course true. However, the minimum for configuration for the MOD is 2 cores. Unless we’d have some CPU problems due to non-audio system tasks, I think it is safe to say that increasing the number of cores wouldn’t help if your audio chain is entirely in series.

I found some other hint on the following link: https://github.com/jackaudio/jackaudio.github.com/wiki/Q_difference_jack1_jack2

SMP support is not always as valuable as you would think. If your applications are chained INPUT --> A --> B --> C --> OUTPUT, then it will not be able to utilize multiple processors. However, if you applications are independently generating audio to the OUTPUT, that is when “parallel” sub-graph exist in the global graph, then they can be.

What is not clear to me is the definition of parallel sub-graph in this case. Would parallel branches of a 1 channel audio chain be optimized?

      --FX2--
     /       \

IN--FX1-- --FX4--FX5--OUT

Or only when the audio chain deals with more channels?

More concretely, I tried to do some investigation with a current pedalboard of mine:

On the web interface, the CPU load shows about 66%. However, top shows the following:

I am a bit confused, as the jackd service shows CPU consumption at ~90% and the memory consumption also doesn’t match. Also, it doesn’t show any core distribution (/0 or /1). Is there a better way to check how much of each core the pedal board is taking?

JACK DSP load is not the same as system CPU load.
The percentage you see in the web gui is related to the DSP load (the amount of time, in the audio thread, that is left for more processing).

If you have a plugin that does a sleep in the audio thread, it will consume no actual system CPU cycles, but it will simply waste all the time available to do audio processing.

The parallel branches are optimized is a simple way, but it is hard to tell exactly without looking deep into the code.
Also worth noting that the kernel takes care of starting/scheduling threads, JACK does not pin threads to CPU cores or anything like that.

do you think tricks like CPU binding and core affinity would lower latency?

Not by itself, but in theory could help reducing xruns.
The kernel schedules tasks pretty well though, so it is rarely needed.

the audio backend for mod (jack2) is multi-threaded, so these things don’t apply to it.