Last weeks I have been working hard on optimizing our 3D engine JellyTouch. I have tried eventually all recommended methods to improve rendering performance and discovered that some of them work well, some provide small benefit and some don’t work at all. I think it is a good idea to share my results and probably have some feedback. The tests were performed on one of the most popular models – iPhone 3G 16GB.
This works well:
Triangle stripping. This classic technique which is often mentioned as obsolete nowadays still does its job. It has shown a dramatic performance boost in our tests, especially for large meshes. Try my triangle stripping utility to check how it works for you.
Using smaller data types for geometry. This reduces memory footprint for 3D objects and also improves performance significantly.
Using glDrawElements instead of glDrawArrays. It is indeed faster because it eliminates the need for repetitive vertices. Repetitive indices are much cheaper because they have smaller size.
This works but not so well:
Using smaller textures. Helps to save memory but doesn’t speed things up a lot. I only noticed a small speed gain.
Using V3_N3_T2 interleaving. In my tests it came out to be a little bit (approx. 2%) faster then other types of interleaving.
This doesn’t work:
PVRTC texture compression. It seems that this method is intended to reduce the size of the texture but not rendering performance. The FPS was even less then it was with plain PNG textures.
Using VBO. This actually works but only on iPhone 3GS and later devices to follow. Generations 3G and below don’t seem to benefit from this so probably VBO is not yet implemented although the API supports it.


I also noticed no speed improvements with VBO lists. I actually saw my friends Gen 1 iPhone 3G 16Gig crash trying to use VBO, while the simulator and the iPod Touch 2nd Gen have no issues. I used your tutorial to double check my code but its sound.
I like your tips for speed, I’ll try a triangle strip model, see if this improves the fps on my friends gen1 iphone, it gets 8 fps, which didn’t seem right to me.
eSpecialized, if you are using dozens or hundereds of objects there is another thing that will boost your rendering speed – minimizing draw calls. Group objects by material and render them with a single call.
Our test was using a few big objects so it is not mentioned in the post but it works greatly.
Thanks Sergei, for posting this summary. Very helpful. I’ve implemented some of the items on your list with similar results. Wish I found this earlier.
Hi,
Thanks for your article. I’m also trying to speed up render times on the 3G, and I’ve noticed that in timing glDrawElements() that it’s taking an awful lot of CPU time (500 indexed triangles taking about 1.5ms of *CPU* time and its proportional to the number of triangles drawn). Generally speaking all my gl* calls seem to be blocking calls, even glClear seems to take 1.5ms of CPU time to clear color and depth buffers.
Normally a draw call should do little more than issue a GPU draw command that’ll get picked up by the GPU at a later stage, so it should take the same length of time whether I issue 10 or 1000 triangles to be drawn. I understand the GPU taking this much time, but why would this consume significant CPU time unless a good chunk of this is actually implemented in software on the CPU? Is this consistent with your experiences or am I doing something wrong? I know it’s not hte case, but it just feels as though my CPU and GPU are not decoupled at all on the 3G. It’s also frustrating because I want to step into these calls to learn more, even if it’s the assembly, but I’m unable to in the debugger.
Cheers,
-Rob.
Hi Rob,
glDrawElements/glDrawArrays is generally slow because actual draw call is the place where all the delayed work is being done (GL state validation etc). If you extensively modify GL state between your draw calls this may involve some CPU workload. Also you mention iPhone 3G so another reason I can think of is its “fake” implementation of VBOs. Whether I use them or not on 3G there seem to be no performance difference. That may mean they don’t physically exist and the geometry stays in memory so CPU has to transfer the geometry to GPU on every draw call.
My overal impression of testing iPhone 3G is that this geometry transferring by CPU is one of the main bottlenecks. Try your code on a newer device if you can to see if there’s any difference.
OK, so I quickly tried on a 3GS with exactly the same code and the difference was enormous. Issuing the draw calls dropped from 5.7ms to about 0.1ms! This is as I would have expected to see on the 3G, so I’m starting to think there’s a lot more work going on in the draw calls than simply transferring data! From these timings it seems as though the CPU is blocking while the GPU completes each operation, which would explain a 1.5-2ms CPU duration for a full screen fill, which drops proportionally when I reduce screen coverage. When I get to spend more time with the 3GS I’ll take a closer look.