1-3hit |
Marcello LAJOLO Luciano LAVAGNO Alberto SANGIOVANNI-VINCENTELLI
Cache memories are one of the main factors that affect software performance, and their use is becoming increasingly common even in embedded systems. Efficient analysis of the effects of parameter variations (cache size, degree of associativity, replacement policy, line size, . . . ) is at the same time an essential and very time-consuming aspect of embedded system design, whose complexity increases when multi-tasking and real-time aspects must be considered. We propose a new simulation-based methodology, focused on an approximate model of the cache and of the multi-tasking reactive software, that allows one to trade off smoothly between accuracy and simulation speed. In particular, we propose to accurately consider intra-task conflicts, but approximate inter-task conflicts by considering only a finite number of previous task executions. The rationale for this choice can be found in a common pattern in embedded systems, where a "normal" data flow results in a regular intra-task common flow, interrupted from time to time by some urgent event, that pessimistically can be considered as disrupting the cache behavior. The approach is conservative because re-execution of a task after a large amount of time will always be considered as not in cache, and the simulation speed-up is considerable.
Edoardo CHARBON Enrico MALAVASI Paolo MILIOZZI Alberto SANGIOVANNI-VINCENTELLI
In this paper we propose a comprehensive approach to physical design based on the constraint paradigm. Bounds on the most critical circuit parasitics are automatically generated to help designers and/or physical design tools meet a set of high-level specifications. The constraint generation engine is based on constrained optimization, where various parasitic effects on interconnect and devices are accounted for and dealt with in different manners according to their statistical behavior and their effect on performance.
Eric TOMACRUZ Jagesh V. SANGHAVI Alberto SANGIOVANNI-VINCENTELLI
The performance of a drift-diffusion device simulator using massively parallel processors is improved by modifying the preconditioner for the iterative solver and by improving the initial guess for the Newton loop. A grid-to-processor mapping scheme is presented to implement the partitioned natural ordering preconditioner on the CM-5. A new preconditioner called the block partitioned natural ordering, which may include fill-ins, improves performance in terms of CPU time and convergence behavior on the CM-5. A multigrid discretization to implement a block Newton initial guess routine is observed to decrease the CPU time by a factor of two. Extensions of the initial guess routine show further reduction in the final fine grid linear iterations.