1-2hit |
The conventional linear or tiled address maps can degrade performance and memory utilization when traffic patterns are not matched with an underlying address map. The address map is usually fixed at design time. Accordingly, it is difficult to adapt to given applications. Modern embedded system usually accommodates memory management units (MMUs). As a result, depending on virtual address patterns, the system can suffer from performance overheads due to page table walks. To alleviate this performance overhead, we propose to cluster and rearrange tiles to construct an MMU-aware configurable address map. To construct the clustered tiled map, the generic tile number remapping algorithm is presented. In the presented scheme, an address map is configured based on the adaptive dimensioning algorithm. Considering image processing applications, a design, an analysis, an implementation, and simulations are conducted. The results indicate the proposed method can improve the performance and the memory utilization with moderate hardware costs.
Conventional TLB (Translation Lookaside Buffer) coalescing schemes do not fully exploit the contiguity that a memory allocator provides. The conventional schemes accordingly have certain performance overheads due to page table walks. To address this issue, we propose an efficient scheme, called block contiguity translation (BCT), that accommodates the block size information in a page table considering the Buddy algorithm. By fully exploiting the block-level contiguity, we can reduce the page table walks as certain physical memory is allocated in the contiguous way. Additionally, we present unified per-level page sizes to simplify the design and better utilize the contiguity information. Considering the state-of-the-art schemes as references, the comparative analysis and the performance simulations are conducted. Experiments indicate that the proposed scheme can improve the memory system performance with moderate hardware overheads.