Wednesday, March 28, 2012

Problems with the Xrun

So... Once again I'm not working on my thesis. I should be. I always should be.

But I had some time to record a bit last weekend and it got discouraging. I'm on  a time schedule (got to finish before either the babies wake up from napping and/or my wife gets home) and every take has the annoying pops of Xruns.

If you don't know xruns are buffer underruns or overruns, which essentially are when the programs don't synch up correctly and so there is a gap in information flow which (I presume) the gap gets filled with zeros so the complex waveform suddenly gets set to zero for several samples leading to high frequency content in the sharp cutoff and return of data. Most people call them the annoying pops in digital music.

But its getting to the point I need to spend some time configuring jack to remove these. Then I can focus my time on recording (when I have time, which is borrowed anyway). Sharpen my saw, you know? I aimlessly went about guessing settings before, since theres no concrete information, since every system is so different.

For the most part there are three knobs you can turn to influence jack's performance:
  1. Frames/Period
  2. Sample Rate
  3. Periods/Buffer
Frames/Period is the number of "frames" of analog information converted to a digital number in each period. Frames are the same as samples as far as I know. Sample rate is samples of data per second. Periods/Buffer I believe is the number of sections the input and output buffers are cut up into. The strategy jack takes is to slice up the buffer and send half to the hardware half to the software then switch. This way the data can get processed and sent out in as few as 2 interrupts.

The problem arises when a program has too many instructions to perform on the data before the interrupt comes to switch the data out. Xrun. So just add more frames per period. Since a CPU is much faster than the sampling period doubling the samples from 128 to 256 doubles the amount of time for processing. True it can simultaneously double the number of instructions to be performed but since your processor runs 2,000,000,000 cycles in a second (~2GHz but probably limited a bit in bus speed) and your samples are at 48000 a second you see that the scaling is in your favor. So why don't we just do 4096 Frames/Buffer? Well the problem is if you give it too much time the software finishes its work and the data sits there until the hardware is ready. This means latency. Thus there is a tradeoff between latency and xruns.

Latency is the time it takes from when you send your input (guitar strum, key press, drum strike, etc) to where you hear the output. Every 10ms of latency corresponds with being ~10 feet further from an analog source, so a little bit isn't bad. In my experience 10ms is tolerable, 20ms is noticeable, 40ms bothersome to the point I can't use it. Some people (with better rhythm) are very sensitive to even 3ms latency.

Anyway so we have this tradeoff. Where shall I put myself in the spectrum? Well xruns can be tolerable for some things. If a few xruns pop up when you're doing some GUI intensive things between songs of a live performance its not so bad, but in a recording environment they are NEVER ok. They get recorded to disk. They ruin takes of good performances. For me, not a great player for some instruments so I usually only get it right once. So far i've just said, well, I'll live with the xrun, but it ruins the level of professionalism.

Problem: I have xruns.
Hypothesis: Jack settings exist that would have tolerable levels of latency and no xruns.
Experimental Design: Try different jack settings while doing processor intensive tasks
Data: Take the number of xruns in a given amount of time for each setting.

First I was using Ardour 2.8 playing the song I was recording last weekend, but it was only playback of 8 tracks or so at the most, so I thought I'd better come up with a more intense test. I used Rakarrack and tried to come up with the most expensive effect preset as possible.
While Rakarrack  is made to be as efficient as possible, theres no getting around 2 convolution processes and pitch shifter and arpeggiator... you get the idea. I would hook it up, turn on the effect and leave it on for about 5 seconds, then count the number of xruns.
I'm running Xubuntu 11.10 with the generic kernel (I'll explore RT kernels later) running the built in intel sound card of my Lenovo T400 2.4 GHZ  Core 2 Duo with 2 gigs RAM. Here's what I found:

Sample Rate: 44.1Khz                                                                   
                                   | Periods/Buffer
 Frames/Period      |

curse these non-monospace fonts. Can't make a decent table. LaTeX!:
 I know the data is incomplete but its fairly linear so its not too much to interpolate it in your head. The data is in the format xruns(notifications). I'm not sure why qjackctl does it this way since I'll get pops for either. Even though most of these runs had no "official" xruns the notifications are a problem. Anyway:

The lowest latency  with no xruns is 20ms for Sample Rate 96KHz, 2 Periods/Buffer, and 1024 Frames/Period. This suggests that I need to do something to really get the performance I need such as a realtime kernel, getting more RAM, or perhaps using a different soundcard.

 I had set Periods/Buffer at 6 or so before with my random guessing thinking that it let me have much lower Frames/Period without too much additional latency, but the data shows that increasing Periods/Buffer doesn't help and can make xruns worse in some cases. I tried to carry the experiment until I found the threshold of where no xruns occurred. In every sampling rate the threshold was independent of Periods/Buffer. There may be some advantage to higher Periods/Buffer if you have lots of client applications looking at the data, but I don't think so. So leave it at 2. You can experiment with more though. Experiments are healthy and fun.

Future Work:
Run these tests using my Presonus Firebox firewire interface (soundcard), try switching my graphics card to use their discrete memory to open up more RAM and use Abogani's lowlatency or RT kernels. 


I realised a few weeks later that my CPU frequency governor was causing havoc. Basically the governor is to save electrical power by running your processor at a lower frequency which slows down the transistors transitions but requires less energy for the same number of computations. This is great for working on my thesis because the governor runs the clock at 800Mhz when I'm just typing and I never even notice. Then when I run a crazy intense octave (matlab) script with 30,000 runs for Monte Carlo analysis, it kicks the clock up to full 2.4Ghz and the calculations finish in a few hours. There is a governor for each CPU. While this is great for most tasks, it is inappropriate for audio work because the delay for the computer to recognize that the load is large results in xruns before the clock changes to a more appropriate frequency. It's much better just to leave it on full power all the time.

Now you understand the theory here's some application. In the terminal:

cpufreqselector -c 0 -g performance
cpufreqselector -c 1 -g performance

I do it for each of my 2 cores. This significantly reduces my xruns. When I'm ready to relinquish the computer back to my wife I run

cpufreqselector -c 0 -g ondemand
cpufreqselector -c 1 -g ondemand

If I had time I'd run these same scientific tests with the new governor but alas, time is money and I've got none. 

1 comment:

NiVofHiR said...