Tutorial: Optimizing Hardware Function Evaluation
Speakers: Haohuan Fu, Oskar Mencer, Wayne Luk
Location: room ETZ F76.1, ETZ building 

Abstract
Many applications in multimedia, communications and finance involve mathematical functions, such as exponential and trigonometric functions. A good understanding of hardware function evaluation enables selection and development of architectures for specific requirements; such architectures typically make use of polynomials, tables, shift-and-add, and various other techniques. The large number of available methods leaves users with the task of deciding when to use which method. 

In this tutorial, we first present a methodology and an automated system to decide which method to use, given range, precision, space, and time considerations. We show how to select the best function evaluation hardware for a given function, accuracy requirements, technology mapping and optimization metrics, such as area, throughput, and latency. Function evaluation f(x) typically consists of range reduction, and the actual evaluation on a small convenient interval such as [0, pi/2] for sin(x). 

We outline the impact of hardware function evaluation with range reduction for a given range and precision of x and f(x) on area and speed. An automated bit-width optimization technique for minimizing the size of operators in hardware datapaths is also proposed. We illustrate design space exploration for various fixed-point functions such as sin(x) and log(x) accurate to one unit in the last place using MATLAB and ASC, A Stream Compiler for Field-Programmable Gate Arrays (FPGAs). 

As part of the function evaluation optimization challenge, we extract bit-width optimization as an easily identifiable subproblem with a wide variety of possible solutions. Bit-width analysis can be done statically or dynamically. We show one method for static bit-width analysis based on affine arithmetic, and one method for dynamic bit-width analysis based on automatic differentiation of sets of expressions. 

The application of the methods presented in this tutorial can be used to optimize hardware implementation at the architecture level, the arithmetic level, and the bit level.