Sunday 4 December 2011

Android - high performance audio - how to do it

I've just been through a very interesting period of work, sorting-out a high-performance audio interface for the Android port of Mixtikl. I've learned quite a few things - here are the highlights.

Firstly, target Android 2.3 or later. This allows you to use OpenSL ES, which is the only realistic approach for low-latency audio on Android. The audio allows you to delivery audio with pretty low latency, and totally avoids any problems of garbage collection blocking you. This of course assumes that all your audio code is written in C++!

As you're using OpenSL ES, and assuming you have some very heavy audio DSP going-on (like in Mixtikl!), you'll need to use a separate Posix thread to keep your audio callbacks pumped-up. Basically, if your OpenSL ES audio block callbacks take much time at all, then your audio will break-up. So, use a worker thread to keep sufficient audio pre-processed and ready to be picked-up and delivered via your callbacks!

Finally, and believe me this is important (!): make sure you target armeabi-v7a, and that you use the right compiler flags to generate 32-bit ARM7 code. If you on the other hand use the default settings, you'll generate Thumb code for armeabi - and your code will run staggeringly slower  (!!), and audio break-up is inevitable if you're doing anything serious. So: don't bother targeting armeabi devices... and Thumb code is a no-no.

Follow my advice, and you can create sophisticated audio software. Don't follow it, and you're going to find things tough! ;)

I should note that support for  armeabi-v7a *emulator* targets arrived in the Android 4.0 SDK... which makes things a lot easier as well... something worth knowing!

Finally... here is a useful resource on the background to OpenSL ES on Android...:
http://mobilepearls.com/labs/native-android-api/opensles/index.html

13 comments:

Steve Broumley said...

Hi Pete,
I'm currently converting my iphone game "HiHowAreYou" to Android and for faster frame rate (OpenAL mixer is slow on iphone) I wrote my own audio engine/mixer which simply continuously fills in a stereo 16 bit, 44Khz buffer that gets submitted to the hardware audio for playback.

Now the iphone has a very low level level, low latency audio unit system and for android the closest interface is the simple buffer queue in OpenSL. I got everything up and running on android in a short time, but no matter what settings I use for the number of simple buffers or their size, the audio latency on the kindle fire is always about 0.5 seconds which is very apparent when starting sound effects etc.

Can I ask how many channels/format/rate your have for your output mix object and how many simple buffers/their size are you using in opensl for your app? I've tried all kinds of different values but the latency always seems to be around 0.5 seconds :-/

Also, currently I'm filling in each buffer in the opensl queue callback which isn't a problem since my mixer is very fast arm asm. I don't think another thread would help the problem.

cheers!,
Steve.

Steve Broumley said...

For your reference, I also posted my current Android OpenSL setup here:

http://diaryofagamesprogrammer.blogspot.com/2012/02/android-opensl-audio-latency-woes.html

thanks for any help!
-Steve.

Pete Cole said...

Hi Steve,

Thanks for your interest, and sorry to hear of your troubles. :)

Yes, the latency on the Kindle Fire is the worst I've (yet) seen on Android. That said, it is a lovely device, very well suited to generative audio mixing (c.f. my product Mixtikl!) where latency isn't such a big deal (except in the Mixtikl Visualiser tap mode, of course!).

The latency level you're talking about sounds about right :(. For Mixtikl on Android, we actually have to push this out much further, as our audio synthesis and FX engine is *very* busy and I've found that the system can break-up under heavy load with complex mixes and lots of DSP work going on (I'd guess that the audio thread isn't given a high-enough priority in the Kindle Fire!).

To account for this, we have introduced to Mixtikl user-adjustable settings levels for both block size in sample frames, and number of blocks to buffer.

The worst-case (Kindle Fire!) requires us to set this high "out of the box", but on most other devices (AFAIK) the user can "tweak" these settings to run a *lot* lower (e.g. my test Android 2.3 phone which is an Acer Liquid Metal).

I guess you'll have seen elsewhere on my blog thoughts on the deficincies on OpenAL, with respect to querying the device for sustainable buffer sizes etc.; the OpenAL design is seriously flawed from my view and we're all stuffed by it! FWIW, I supplied my feedback on this to Google, I dare say they've ignored my comments though! ;)

I'll dig in to the code and try to give you some more info, probably later today. That said, my general tip would be to run at as low a sample rate as you can get away with, with mono if you can... no point in wasting CPU cycles where you don't have to!

Best wishes,

Pete

Steve Broumley said...

Thanks for your reply Pete!
I've tried everything to try and get the latency down - but I found it doesn't matter how many queue buffers, or their size, or what sample rate, bit depth (8 or 16), channel count (stereo or mono) that I make the mix output, there's always about a 0.5sec latency! It's very frustrating. My cpu mixing is super fast and is not a bottle neck btw.

Today I tried using the file descriptor opensl method which reduces latency once a sound is loaded/cached, but it comes with a bunch of other problems such as hitches when sounds are decompressed after loading, and also there's a limit on how many sounds can play at once before you run out of opensl objects so they have to be dynamically allocated and loaded which causes hitches!
I may just have to wait until the Kindle Fire OS engineers address this issue! Do you know who I should contact about this? (I'm very new to android/kindle).

Thanks for your help
-Steve.

Pete Cole said...

Hi Steve,

I just double-checked:
Block Size Sample Frames = 1024
Block Count = 10

Each sample frame = 2 samples (Left/Right) 16-bits per sample

Sample rate = 44100

Running with anything less can cause break-up on the Kindle Fire when using very heavy loading!
This is despite some very neat threading etc., as you'd expect of the Mixtikl audio engine! ;)

Yes, this is one for the Kindle Engineers to solve; I'm not sure that they've thought much yet about the needs of games and multimedia developers. I really don't know who to contact there - not that they'd listen to me anyhow. :-D

If you know somebody there, and you think they'd value my input, then do please feel free to point them my way as I am *certain* they need help from people who really understand this domain...!

Hoping this all helps, and best wishes,

Pete

neeraj said...

Hi Pete,

I am currently working on a voip solution where the android audio latency is a big issue. Just wanted to ask if you have been able to get a latency of less than 180ms(that's what I get on ICS)?..

@Steve - I don't know if I am out of my depth here. But I am getting latencies of around 180ms on ICS & 230ms on Gingerbread using AdioTrack in the streaming mode. Btw OpenSL is built 'atop' AudioTrack so its not helping me with the latency. Have you tried using AudioTrack in the static mode?

Pete Cole said...

Hi neeraj,

Those figures sound pretty typical!

If you want to stream audio from Java, you have to use AudioTrack in streaming mode.

If you want to stream audio from C/C++, you can use either AudioTrack (via JNI coupling) or OpenSL. OpenSL avoids Java altogether (not being based on AudioTrack), so should avoid Java-specific issues e.g. the garbage collector getting in the way. I'd recommend use of OpenSL on Android 2.3, with AudioTrack as a fall-back on earlier platforms.

Pete

Unknown said...

Hello Pete,

I found your blog and it seems that you're really an expert in embedded audio. Could you perhaps give me some guidance in the following problem?

I want to write a full-duplex implementation of audio for Android, to be used in VoIP (yes, I know, latency...).

Now I have a two-threaded model, one thread for uplink, the other for downlink.

On many devices, including the relatively powerful HTC One X, the code behaves strangely...

1) Recording is A-ok, especially with larger buffers (say, 140 ms).

2) Playing is A-ok even with smaller buffers.

but at the moment when I try to run them simultaneously, even with a simple PCM loopback (which does not process the data heavily), both of them go south.

Callbacks from recorder stop occuring regularly (every 140 ms), but instead pop fairly irregularly, on average every 440 ms. Same with the player.

What has possibly gone wrong inside?

Rebuilding the code for ARM7 did not alleviate the situation.

Unknown said...

BTW My name is Marian Kechlibar and I am from Prague. I do not know why it displays Unknown on my previous comment.

Pete Cole said...

Hi,

You don't mention which API you are using, but you should look to use OpenAL, with a dedicated audio thread, as otherwise audio break-up is pretty inevitable.

Try dumping-out the received audio (as you are passed it) to a plain file, and open-up as a raw audio file with an audio tool; make sure there aren't any discontinuities in the audio. If there are, then you're probably pushing your device too hard - drop the sample rate.

I would suggest dropping the audio I/O sample rates right down, as low as your device will go, and see if the problem still exists.

Make sure your UI isn't doing any heavy processing, that might be stealing cycles from the audio threads.

Operate a ring buffer, with built-in latency, to hold-on to sufficient input audio data, and read-out from this with a delay; to compensate for not enough time being allocated to your audio threads.

Best wishes,

Pete

Unknown said...

Hi Pete,

Ive been trying to create an app which produces a specific frequency depending upon the touch coordinates on the screen.

When i hold it steady the sound generated is fine. But when i slide to change the frequency I start hearing "pop/Crack/noise" along with the change in freuqency.

I do know that

the pop sound is caused when the waveform suddenly moves from Zero to high value and vice versa.

Or

When the Audio Playback finishes and the next set of buffer to play is not ready (in streaming mode).

I checked the following things

The PCM data generation is faster than the Playback rate. So there should not be any case of playback finishing and the Buffer not being ready.

The only thing i feel is happening is that Playback is getting blocked when the touch happens and the values are being updated.

Given below are the details of the app.

I am using OpenSL ES for PCm data generation and playback

Using a GlsurfaceView to send (x,y) coords which is being sent to C++ side.


Please advise.

Pete Cole said...

I would suggest that you first treble-check that your synthesis algorithm is correct.

Write your generated audio data to a file as raw 16-bit PCM, and open-up the file with an appropriate audio editor (as raw 16-bit PCM data, make sure you specify a the correct bit rate and channel count), and verify that the generated audio is entirely smooth/continuous.

Only when you're 100% sure that your generated audio is correct, should you start looking at the audio interface code.

Pete

Unknown said...

Hi Pete:

Great post, help me a lot. Thanks.

armeabi-v7a, what a great hint! :)