Ir para o conteúdo Ir para a navegação

Você está aqui: Página Inicial Calendário de eventos Defesa de dissertação - PPGC

Defesa de dissertação - PPGC

Quando 22/03/2012
das 13:30 até 19:30
Onde Sala 220 (conselhos). Prédio 43412 - Instituto de Informática
Adicionar evento ao calendário vCal
iCal

Título: Dynamic Detection of the Communication Pattern in Shared Memory Environments for Thread Mapping

Aluno: Eduardo Henrique Molina da Cruz

Orientador: Prof. Dr. Philippe Olivier Alexandre Navaux

Banca Examinadora:

Prof. Dr. Antonio Carlos Schneider Beck Filho (UFRGS)

Prof. Dr. Flávio Rech Wagner (UFRGS)

Prof. Dr. Guido Costa Souza de Araújo (UNICAMP)

Presidente da Banca: Prof. Dr. Philippe Olivier Alexandre Navaux

Abstract: The threads of parallel applications cooperate in order to fulfill their tasks, thereby communication is performed among the themselves.

The communication latency between the cores in a multiprocessor architecture differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them.

In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables, which makes difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead.

In this master thesis, we introduce two novel light-weight mechanisms to find the communication pattern of threads.

The first mechanism makes use of the information about shared cache lines provided by cache coherence protocols.

The second mechanism makes use of the Translation Lookaside Buffer (TLB) to detect which memory pages each core is accessing.

Both our mechanisms rely entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system.

Moreover, no time consuming task, such as simulation, is required.

We evaluated our mechanisms with the NAS Parallel Benchmarks (NPB) and obtained accurate representations of the communication patterns.

We generated thread mappings with the detected communication patterns using a mapping algorithm.

The mapping problem is NP-Hard. Therefore, in order to achieve a polynomial complexity, our algorithm is a heuristic method based on the Edmonds graph matching algorithm.

Running the applications with these mappings resulted in performance improvements of up to 15.3%.

The number of cache misses, cache line invalidations and snoop transactions were reduced by up to 31.9%, 41% and 65.4%, respectively.

Universidade Federal do Rio Grande do Sul

Av. Paulo Gama, 110 - Bairro Farroupilha - Porto Alegre - Rio Grande do Sul
CEP: 90040-060 - Fone: +55 51 33086000