Performance evaluation of macroblock-level parallelization of h.264 decoding on a cc-numa multiprocessor architecture

Miniatura

Autores

Alvarez, Mauricio
Ramirez, Alex
Valero, Mateo
Azevedo, Arnaldo
Meenderinck, Cor
Juurlink, Ben

Director

Tipo de contenido

Artículo de revista

Idioma del documento

Español

Fecha de publicación

2009

Título de la revista

ISSN de la revista

Título del volumen

Documentos PDF

Resumen

This paper presents a study of the performance scalability of a macroblock-level parallelization of the H.264 decoder for High Definition (HD) applications on a multiprocessor architecture. We have implemented this parallelization on a cache coherent Non-uniform Memory Access (cc-NUMA) shared memory multiprocessor (SMP) and compared the results with the theoretical expectations. The study includes the evaluation of three different scheduling techniques: static, dynamic and dynamic with tail-submit. A dynamic scheduling approach with a tail-submit optimization presents the best performance obtaining a maximum speedup of 9.5 with 24 processors. A detailed profiling analysis showed that thread synchronization is one of the limiting factors for achieving a better scalability. The paper includes an evaluation of the impact of using blocking synchronization APIs like POSIX threads and POSIX real-time extensions. Results showed that macroblock-level parallelism as a very fine-grain form of Thread-Level Parallelism (TLP) is highly affected by the thread synchronization overhead generated by these APIs. Other synchronization methods, possibly with hardware support, are required in order to make MB-level parallelization more scalable.

Abstract

Descripción Física/Lógica/Digital

Palabras clave

Citación