
Experiments on linear algebra, artificial intelligence and image processing bench- marks show that our method accurately determines an optimized data-structure implementation. These implementations are also able to adapt, at run time, to the input of the considered source-code.Aiming to evaluate our implementations on different hardware environments, we have considered two different processor and memory architectures: (i) An x86 pro- cessor implementing an Intel Xeon with three levels of data-caches utilizing the least recently used replacement policy and a (ii) Massively Parallel Processor Array im- plementing a Kalray Coolidge-80-30 with a 16KBytes on-chip scratchpad memory. The generated implementations encompass the specific load and store routines as well as the granularity attributed to each data transferred. In this context, we propose DDLGS, a custom patented method designed to generate a dynamic data-layout with regards to the followed memory-access pattern. The problem that we address is to find an optimized memory- placement in order to maximize the amount of frequently-accessed data to be stored within this fast yet narrow memory. The generated solutions are also specifically adapted to the properties of the host hardware-memory.Meanwhile, we consider the singular resolution of the DLD problem on memories that are explicitly addressed by the programmer (such as embedded scratchpad memories or GPUs). The HARDSI method allows to choose, from a custom base of knowledge, an optimized data-layout implementation with regards to the memory-pattern followed to access the considered data-structure. We also propose to apply our method using a custom domain-specific language and computation framework. The proposed approach is designed to be embedded within a general-purpose compiler.In order to explore the parameters related to the data-layout implementation, we propose HARDSI, a custom patented method to solve the DLD problem. We also propose a custom data- cache-miss modeling algorithm designed to be used as fully-parameterized perfor- mance evaluation. We present an iterative data-mining-related software-optimization approach based on the detection and the exploration of the most influential parameters linked to the hardware, operating system and software. It also requires a deep knowledge of the host hardware platform.In this thesis, we plot a first step toward automatic software-adaptation to hard- ware. Slightly modifying an optimized application or porting it to a new hardware architecture requires an important time and engineering effort. The complexity of solving efficiently this Data-Layout-Decision (DLD) problem is dra- matically increased by the concurrence of complex, heterogeneous and application- specific hardware memories. With the rising impact of the memory wall, selecting the adequate data-structure implementation for a given kernel has become a performance-critical issue.
