MOD Duo Latency Measurement

edwillys · June 1, 2020, 6:03pm

I meant complete system latency. With MODEP, they launch jack on sync mode (-S), which makes the round-trip latency at about 5.33ms for 128 samples per block @48kHz. Besides, there you can set the block length to 64 to further reduce it.

I find this sentence a bit misleading. Although 128 / 48k is indeed equal to ~2.6ms, we have 3 times that in the jack configuration.

8ms is definitely noticeable and one have to keep in mind that it comes on top of everything else (WiFi system, speaker to ear distance, etc…). So much so that it was the reason why I did the measurements in the first place.

This is a fair point, though I’d say especially for the MOD Duo. For the MOD X and the Dwarf, the CPU performance is much improved. One solution could be to leave it up to the user to select the block length, as already possible with MODEP (down to 64) and with MOD, though only between 128 and 256. One major improvement would be to be able to configure jackd on sync mode, thus going down to ~5.33ms. Not sure how much easier or harder it is than finding a way to set the block length to 64…

redcloud · June 1, 2020, 6:11pm

I’m wondering where is the bottleneck, HW side or SW side?

Klaustrophil · June 9, 2020, 11:08pm

My use-case is using the Duo X as a send-effect in my synth setup. When I apply the effects to something time crucial like drums (and this is one of the main reasons I bought it), each millisecond delay is more likely to be audible.

At least with MODEP on the RasPi 4 I’m only using ~30% of the CPU and I’m already happy with my rather wasteful designed board. Using up to 50% CPU (to leave 20% spare, dunno if this makes sense) to reduce the latency further sounds like a great idea to me.

falkTX · June 10, 2020, 9:23am

Alright, I can do some tests to see if 64 frames is feasible with the Duo X.
It needs to be 64 or 128, a power of 2, because a few plugins that do FFT require it.
At least disabling the JACK2 async mode will already help

maxman · June 10, 2020, 1:57pm

i like the idea of going down some ms… would love to try 64 frames… using the modx i have plenty power left most of the times…

falkTX · June 14, 2020, 3:49pm

So a few tests with the Duo X (initial model), using jack_iodelay tool to measure physical out-to-in latency.
This measures full input and output latency combined.

Current defaults (128 frames, 2 periods per buffer, async mode):

   406.723 frames      8.473 ms total roundtrip latency
        extra loopback latency: 22 frames

Safer sync mode (128 frames, 3 periods per buffer, sync mode):

   405.718 frames      8.452 ms total roundtrip latency
        extra loopback latency: 21 frames

Defaults with sync mode (128 frames, 2 periods per buffer, sync mode):

   277.724 frames      5.786 ms total roundtrip latency
        extra loopback latency: 21 frames

Safer 64 frames mode (64 frames, 3 periods per buffer, async mode)

   277.723 frames      5.786 ms total roundtrip latency
        extra loopback latency: 21 frames

Safer 64 frames sync mode (64 frames, 3 periods per buffer, sync mode)

   214.723 frames      4.473 ms total roundtrip latency
        extra loopback latency: 22 frames

Defaults with 64 frames mode (64 frames, 2 periods per buffer, async mode)
failed, cannot run

Defaults with 64 frames sync mode (64 frames, 2 periods per buffer, sync mode)
failed, cannot run

So, contrary to the Duo, the Duo X is able to run at 64 frames quite okay (if 3 periods per buffer is also enabled).
Doing so reduces the latency by around 2.7ms.

Instead of reducing buffer size, using sync mode can be done to reduce latency without impacting CPU.
Doing so reduces the latency by around 0.02ms for 3 periods per buffer, or 2.7ms for 2 periods per buffer (same amount as 64 frames reduction)

Combining both lower buffer size and sync mode reduces latency by 4ms (this needs to use 3 periods per buffer though, otherwise the audio/i2s cannot cope with it)

PS: For those wondering what “sync/async mode” is…
Basically the audio engine we use (JACK2) uses an async audio model by default, where the audio renders to a non-active buffer and plugins that were able to finish rendering on time get their buffer copied into the real/active one. This is to prevent misbehaving plugins from causing audio glitches, the audio from such plugins will just not be used. So on a parallel chain of plugins, audio still keeps running except for the chain that includes the bad plugin.
The latency added by this async mode is the same as one audio period.
When using sync mode, the plugins render directly into the active audio buffer. This has lower latency, but makes xruns much more noticeable (one bad plugin can ruin the entire audio graph, even if disconnected)

maxman · June 14, 2020, 4:32pm

thanks for these insight. always highly interesting!

edwillys · June 14, 2020, 6:09pm

These are very interesting and good news!
Should we expect this to be an user option in the UI in a future release? How does it look like for the MOD Dwarf?

redcloud · June 14, 2020, 8:22pm

Yeah, is it possible to test it on dwarf?

falkTX · June 14, 2020, 8:45pm

We are still evaluating things for the dwarf, could be it will be an identical situation to the Duo X, but it is a bit early to tell.

redcloud · June 14, 2020, 9:04pm

Cool! IMHO you should try to keep global latency under 3ms, it would be a killer factor for live situations. If bottleneck is jack, is it possible to investigate alternative solutions replacing it?

falkTX · June 15, 2020, 6:02am

2.7ms is already the block latency for 128/48kHz, going for sub 3ms in actual in-out physical latency means running at 32 frames or less.
It is just not worth it for the CPU.

redcloud · June 15, 2020, 6:37am

So what are the component responsible for the rest of latency?

falkTX · June 15, 2020, 1:27pm

The analog circuitry.
EDIT: correction, it is not really all “analog” stuff, but something out of the control of our software.
@Jan can probably explain.

Let’s take the 64 frames sync case, which has 4.473ms total latency.
64 / 48kHz = 1.333ms;
Running with 3 periods per buffer means 1.333 x 3 = 4ms

So the analog circuitry latency on the Duo X is 0.47ms.
I believe this is lower on the Duo, around 0.2ms.

Anyway, 0.5ms latency is almost negligible.
The math checks out, there is no way around how this works.
What I see happens regularly though, is that applications show their block latency, which misleads users.
At 64 frames with 48kHz sample rate, the block latency is 1.333ms, which many applications will just show as-is. But the actual total/physical latency is at least double than that, 3x for certain audio cards, and plus some from hardware (usb audio cards are usually the worst case in terms of added latency)

redcloud · June 15, 2020, 6:04pm

This “block latency” is outside of jack2? Does one of your last benchmark here

include this 2.6ms latency? Otherwise figures are wrong somehow.

edwillys · June 15, 2020, 6:42pm

jack itself doesn’t add latency. It is an interface to the low level driver, which connects to the CODEC chip. On jack we can set the amount of samples per block, which is this example is 128.

I assume the reason why to do block processing instead of sample processing is clear. The rule of thumb is, the more samples, the less pre/post-amble operations, less context switches, interrupts, etc…

There are also many reasons on why to work with double buffering, a.k.a. ping pong buffering and this is called synced mode in jack world. async mode means we’re adding another extra buffer (also called frame), which I hadn’t seen before this example here and was explained by @falkTX above. As far as I can tell, the math above is correct.

I am not familiar with the jack_iodelay tool, but it seems to be doing some averaging over many frames. In the end it shouldn’t really matter, because the system should be deterministic and there shouldn’t be any difference on latency between frames. Also, somehow there seems to be a slight difference between async 2 buffers and sync 3 buffers, which I don’t completely grasp. This difference of ~0.02ms is however negligible.

What comes on top of this whole buffering thing is the CODEC latency, normally around 1ms and in this case even less and the latency of the analog components, on the order of tenths of nanoseconds.

falkTX · June 15, 2020, 7:09pm

redcloud:

The block latency for MOD units is 2.6ms, because it runs at 48kHz rate with 128 buffer size.

This “block latency” is outside of jack2? Does one of your last benchmark here
Current defaults (128 frames, 2 periods per buffer, async mode):
   406.723 frames      8.473 ms total roundtrip latency
    extra loopback latency: 22 frames
include this 2.6ms latency? Otherwise figures are wrong somehow.

Yes, it includes the block latency 3 times. 2 for the number of audio periods (minimum is always 2 as far as I know), and then 1 extra for the async mode.

so with this we get 2.7ms * 3 = 8.1ms (128/48kHz is actually closer to 2.7ms than it is to 2.6ms)
With the codec 0.4ms latency, the final result is 8.5ms.

redcloud · June 15, 2020, 7:18pm

Thanks! So the bottleneck is the driver? It would be possible to use a customized one for ultra low latency?

edwillys · June 15, 2020, 8:17pm

It is for sure possible. Bela did it as it was already mentioned in this thread previously. I wouldn’t expect it to be an easy task nor this solution to be easily scalable to further MOD devices. It is all a matter of whether the gain is worth the effort. For my ears sub 5ms is enough, though pushing harder for lower latency would allow further chaining of devices (which anyway is not my use case with the MOD pedals )

redcloud · June 15, 2020, 8:27pm

To me giving up on control chain possibility would be a good tradeoff for <3ms latency