Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing

Paim, Guilherme Pereira

dc.contributor.advisor	Bampi, Sergio	pt_BR
dc.contributor.author	Paim, Guilherme Pereira	pt_BR
dc.date.accessioned	2021-05-28T04:27:04Z	pt_BR
dc.date.issued	2021	pt_BR
dc.identifier.uri	http://hdl.handle.net/10183/221703	pt_BR
dc.description.abstract	Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat.	en
dc.format.mimetype	application/pdf	pt_BR
dc.language.iso	eng	pt_BR
dc.rights	Open Access	en
dc.subject	Computação aproximativa	pt_BR
dc.subject	Timing speculative	en
dc.subject	Processamento de vídeos	pt_BR
dc.subject	High performance	en
dc.subject	Energy efficient	en
dc.subject	FPGA	pt_BR
dc.subject	Voltage Over scaling	en
dc.subject	Microeletrônica	pt_BR
dc.subject	Temperature rising	en
dc.subject	Aceleradores de hardware	pt_BR
dc.subject	Device aging	en
dc.title	Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing	pt_BR
dc.type	Tese	pt_BR
dc.contributor.advisor-co	Costa, Eduardo Antonio Cesar da	pt_BR
dc.identifier.nrb	001125900	pt_BR
dc.degree.grantor	Universidade Federal do Rio Grande do Sul	pt_BR
dc.degree.department	Instituto de Informática	pt_BR
dc.degree.program	Programa de Pós-Graduação em Microeletrônica	pt_BR
dc.degree.local	Porto Alegre, BR-RS	pt_BR
dc.degree.date	2021	pt_BR
dc.degree.level	doutorado	pt_BR

Nome:: 001125900.pdf
Tamanho:: 34.78Mb
Formato:: PDF
Descrição:: Texto completo (inglês)

Visualizar/abrir

Este item está licenciado na Creative Commons License

Engenharias (7564)

Microeletrônica (212)

Mostrar registro simples