MOD Duo Latency Measurement

redcloud · June 15, 2020, 6:37am

So what are the component responsible for the rest of latency?

falkTX · June 15, 2020, 1:27pm

The analog circuitry.
EDIT: correction, it is not really all “analog” stuff, but something out of the control of our software.
@Jan can probably explain.

Let’s take the 64 frames sync case, which has 4.473ms total latency.
64 / 48kHz = 1.333ms;
Running with 3 periods per buffer means 1.333 x 3 = 4ms

So the analog circuitry latency on the Duo X is 0.47ms.
I believe this is lower on the Duo, around 0.2ms.

Anyway, 0.5ms latency is almost negligible.
The math checks out, there is no way around how this works.
What I see happens regularly though, is that applications show their block latency, which misleads users.
At 64 frames with 48kHz sample rate, the block latency is 1.333ms, which many applications will just show as-is. But the actual total/physical latency is at least double than that, 3x for certain audio cards, and plus some from hardware (usb audio cards are usually the worst case in terms of added latency)

redcloud · June 15, 2020, 6:04pm

This “block latency” is outside of jack2? Does one of your last benchmark here

include this 2.6ms latency? Otherwise figures are wrong somehow.

edwillys · June 15, 2020, 6:42pm

jack itself doesn’t add latency. It is an interface to the low level driver, which connects to the CODEC chip. On jack we can set the amount of samples per block, which is this example is 128.

I assume the reason why to do block processing instead of sample processing is clear. The rule of thumb is, the more samples, the less pre/post-amble operations, less context switches, interrupts, etc…

There are also many reasons on why to work with double buffering, a.k.a. ping pong buffering and this is called synced mode in jack world. async mode means we’re adding another extra buffer (also called frame), which I hadn’t seen before this example here and was explained by @falkTX above. As far as I can tell, the math above is correct.

I am not familiar with the jack_iodelay tool, but it seems to be doing some averaging over many frames. In the end it shouldn’t really matter, because the system should be deterministic and there shouldn’t be any difference on latency between frames. Also, somehow there seems to be a slight difference between async 2 buffers and sync 3 buffers, which I don’t completely grasp. This difference of ~0.02ms is however negligible.

What comes on top of this whole buffering thing is the CODEC latency, normally around 1ms and in this case even less and the latency of the analog components, on the order of tenths of nanoseconds.

falkTX · June 15, 2020, 7:09pm

redcloud:

The block latency for MOD units is 2.6ms, because it runs at 48kHz rate with 128 buffer size.

This “block latency” is outside of jack2? Does one of your last benchmark here
Current defaults (128 frames, 2 periods per buffer, async mode):
   406.723 frames      8.473 ms total roundtrip latency
    extra loopback latency: 22 frames
include this 2.6ms latency? Otherwise figures are wrong somehow.

Yes, it includes the block latency 3 times. 2 for the number of audio periods (minimum is always 2 as far as I know), and then 1 extra for the async mode.

so with this we get 2.7ms * 3 = 8.1ms (128/48kHz is actually closer to 2.7ms than it is to 2.6ms)
With the codec 0.4ms latency, the final result is 8.5ms.

redcloud · June 15, 2020, 7:18pm

Thanks! So the bottleneck is the driver? It would be possible to use a customized one for ultra low latency?

edwillys · June 15, 2020, 8:17pm

It is for sure possible. Bela did it as it was already mentioned in this thread previously. I wouldn’t expect it to be an easy task nor this solution to be easily scalable to further MOD devices. It is all a matter of whether the gain is worth the effort. For my ears sub 5ms is enough, though pushing harder for lower latency would allow further chaining of devices (which anyway is not my use case with the MOD pedals )

redcloud · June 15, 2020, 8:27pm

To me giving up on control chain possibility would be a good tradeoff for <3ms latency

redcloud · June 15, 2020, 8:37pm

Quoting Bela FAQ

So you are running Pd and and you claim less than one millisecond latency? How is that possible, given Pd’s minimum buffer size is 64 samples?

For two reasons. First, we are not running the Pd program itself, and second one, we are not using the Linux ALSA drivers. Pd patches are compiled into C code using the Heavy Audio Tools. The C code produced is highly optimised and is automatically wrapped into our C++ API. This bypasses the whole Linux kernel (and ALSA) and allows it to run with buffer sizes as small as 2 audio samples, giving a rountrip latency below 1ms for audio (because the ADC/DAC have some built-in latency) and below 100us for analog (whose converters are faster).

unbracketed · June 16, 2020, 3:07am

Also mentioned above, the Bela isn’t very comparable to the MOD devices beyond a very broad sense of being a programmable audio unit. There are notable differences in the technical design choices, the end-user experience, the out-of-the-box practical applications, and the target audiences.

In an academic sense, it’s nice to know what a lower bar might be for digital audio units but everything in hardware and software is about trade-offs and the MOD architecture is focused on building a product that fulfills their vision, which is still heavily based on the use of open source software across the whole stack. This makes a wealth of knowledge and tooling available from the vast Linux ecosystem as compared to other more exotic solutions. There will be more users who have useful knowledge about inner workings and can bring their experience and perspectives into the mix. The toolsets allow for easier building of things like the web interface and continued regular development. This in turn lowers the bar for use as relatively non-technical users are able to grasp the interface and assemble complicated pedalboards with minimal introduction.

It’s OK to conclude that if you need guaranteed sub-4ms latency then these devices won’t be a good choice for you. Many users are already happy, productive, and performing with their Duos on >8ms latency (some users even double the default rate in settings). Users have reported using their Duos in acoustic, orchestral, ambient, rock, etc scenarios with success. Nobody is wrong in the end, just that people have different physical acoustical abilities and tolerances, different use cases with varying latency tolerances and different expectations.

redcloud · June 16, 2020, 7:36am

I’m just trying to understand what the real limitations are and if they can be overcome. If I think of the next Dwarf, I think of a machine designed for live situations, used in most cases by guitarists or bass players. In this scenario, the “competitors” certainly work below the notorious 8ms (I know for sure that Fractal roundtrip is 2ms, I personally tried Hotone Ampero and Mooer GE300 and they are definitely below 8-6ms, you can feel it under your fingers). I don’t think (or rather I hope) that choosing to use a standard kernel rather than a customized one in which the customized part is only the one related to the audio driver does not affect the usage of the other features that are not strictly related to the signal processing (UX, plugin development, etc.). Considering then that if wireless devices are used for I/O the latency would further increase (with Hotone Ampero + 2 wireless devices I think the latency is around 8ms and it is just acceptable, with an A/B test the difference is evident) I would say that an effort could be made to try to improve this aspect (if there are margins of improvement not dictated by the hardware used and at this point I understand that this is not the case with Dwarf and Duo X). It would be an improvement to brag about! I hope that last working test that has given 4.473ms total latency would work well on Dwarf too, it could be still acceptable.

falkTX · June 16, 2020, 9:13am

But might very well end up being the case.
The compromise is that lower latency = more cpu load = less plugins that can be loaded overall.

@unbracketed please correct me if I am wrong, but in Bela’s case only very specific sounds can be loaded right? can it handle more than 1 PD file? and use externals? (problem with PD seems to always be the externals haha)

redcloud · June 16, 2020, 9:33am

From a guitarist point of view, loading no more than 6-7 plugins could be enough. With this limitation if it would be possible to reach 2-3ms latency it would be great. I know Duo X can currently load a lot of heavy plugins simultaneously but I suspect in most real use cases it will lead to unused CPU power. If that low latency could be obtained sacrificing CPU power and selected via UI settings would be a huge feature! I mean something like “Check this flag for low latency (no more than x% CPU usage is allowed or no more than X plugins are allowed)”.

falkTX · June 16, 2020, 9:57am

We understand that, thanks for the continued feedback.

As I said before, this was something that was never too viable for the Duo, so we did not research it much.
For the Duo X and Dwarf, it makes sense to give some options.
We will do some tests and discuss this in the team.

Klaustrophil · June 16, 2020, 10:35pm

I highly appreciate the use of well known technology like ALSA and jack. It makes a lot of fun hacking around in the system and makes it a LOT more valuable. Since the computing power constantly grows, I’m sure that we could half the latency in a few years

Klaustrophil · July 16, 2020, 11:22pm

Is there anything else I should do except for setting the buffer of jack to 64 for testing it by myself on the Duo X?

falkTX · July 17, 2020, 2:34am

you can change /usr/bin/mod-jackd as that is the script that starts up jack.
the “-S” option param is already there, set if reading a file, but since it did not work so well on the Duo it also enables the extra alsa period buffer when doing so.
I think if you look at the file you will understand quickly

aspiers · August 1, 2020, 6:31pm

I’ve just upgraded my Duo to the latest 1.9.1 with mainline kernel - is this likely to affect latency positively or negatively? Thanks.

falkTX · August 1, 2020, 8:27pm

Negatively, by around 0.4ms I think.
There is something that is mainline kernel is doing regarding i2s that adds a tiny bit of delay somewhere, but I could not find out where or how exactly

aspiers · August 2, 2020, 11:13am

I think we can live with that