Performance evaluation of macroblock-level parallelization of h.264 decoding on a cc-numa multiprocessor architecture
Archivos
Autores
Alvarez, Mauricio
Ramirez, Alex
Valero, Mateo
Azevedo, Arnaldo
Meenderinck, Cor
Juurlink, Ben
Director
Tipo de contenido
Artículo de revista
Idioma del documento
EspañolFecha de publicación
2009
Título de la revista
ISSN de la revista
Título del volumen
Documentos PDF
Resumen
This paper presents a study of the performance scalability of a macroblock-level parallelization of the H.264 decoder for High Definition (HD) applications on a multiprocessor architecture. We have implemented this parallelization on a cache coherent Non-uniform Memory Access (cc-NUMA) shared memory multiprocessor (SMP) and compared the results with the theoretical expectations. The study includes the evaluation of three different scheduling techniques: static, dynamic and dynamic with tail-submit. A dynamic scheduling approach with a tail-submit optimization presents the best performance obtaining a maximum speedup of 9.5 with 24 processors. A detailed profiling analysis showed that thread synchronization is one of the limiting factors for achieving a better scalability. The paper includes an evaluation of the impact of using blocking synchronization APIs like POSIX threads and POSIX real-time extensions. Results showed that macroblock-level parallelism as a very fine-grain form of Thread-Level Parallelism (TLP) is highly affected by the thread synchronization overhead generated by these APIs. Other synchronization methods, possibly with hardware support, are required in order to make MB-level parallelization more scalable.