In this thesis, we describe an integer linear programming (ILP) based system called CHIPS for identifying custom instructions given the available data bandwidth and transfer latencies between the base processor and the custom logic. Our approach, which involves a baseline machine supporting architecturally visible custom state regis- ters, enables designers to optionally constrain the number of input and output operands for custom instructions. We describe a comprehensive design °ow to identify the most promising area, performance, and code size trade-o®s. We study the e®ect of the con- straints on the number of input/output operands and on the number of register Żle ports. Additionally, we explore compiler transformations such as if-conversion and loop unrolling. Our experiments show that, in most of the cases, the highest perform- ing solutions are identiŻed when the input/output constraints are removed. However, input/output constraints help our algorithms identify frequently used code segments, reducing the overall area overhead. We provide detailed results for eleven benchmarks covering cryptography and multimedia. We obtain speed-ups between 1.7 and 6.6 times, code size reductions between six per cent and 72 per cent, and area costs that range between 12 adders and 256 adders for maximal speed-up. We demonstrate that our ILP based solution scales well, and benchmarks with very large basic blocks con- sisting of up to 1000 instructions can be optimally solved, most of the time within a few seconds. We show that the state of the art techniques fail to Żnd the optimal solutions on the same problem instances within reasonable time limits. We provide examples of solutions identiŻed by our algorithms that are not covered by the existing methods.