Software transactional memory for gpu architectures

Transactional synchronization extensions wikipedia. Were upgrading the acm dl, and would like your input. Accelerating gpu hardware transactional memory with snapshot. Rafael ubal david kaeli department of electrical and computer engineering. Efficient transactionalmemorybased implementation of morph.

Both hardware and software transactional memories have been proposed for the gpu architectures. Towards a software transactional memory for heterogeneous cpu. Secondly, the con ict detection mechanism is based on uni ed readwrite signatures i. Cpu and gpu architectures, memory subsystem design, hardwaresoftware codesign. Data layout transformation for enhancing locality on nuca chip multiprocessors. Software transactional memory provides transactional memory semantics in a software runtime library or the programming language, and requires minimal hardware support typically an atomic compare and swap operation, or equivalent. In this paper, we analyze the performance and energy ef. A cuda program starts on a cpu and then launches parallel compute kernels onto a gpu.

To improve gpus programmability and thus extend their usage to a wider range of applications, the authors propose to enable transactional memory tm on gpus via kilo tm, a novel hardware tm system that scales to thousands of concurrent transactions. A question that arises in our smart highways use case is this. On the gpu, main memory is accessed via a cache hierarchy where, in most cases, the l1 data cache is not coherent. Pdf modern gpus have shown promising results in accelerating computation intensive and numerical workloads with limited dynamic data sharing. In addition, it ensures forward progress through an automatic serialization mechanism. Nov 11, 20 compiler, architecture and tools conference program abstracts.

I have been working on software transactional memory for in memory database. Systemwide data consistency issues can be handled by a gpu friendly design of software transactional memory. To make applications with dynamic data sharing benefit from gpu acceleration, we propose a novel software transactional memory system for gpu architectures gpustm. If this mechanism is required very often it may harm performance. The major challenges include ensuring good scalability with respect to the massively multithreading of gpus, and. Hardware support for local memory transactions on gpu. Exploration of lockbased software transactional memory justin gottschlich. Modern apus implement cpugpu platform atomics for simple data types. The ability of the gpu to handle considerably more threads than the cpu has recently led to increased interest in utilising transactional memory for gpu. Today most people who make effective use of gpus undergo a steep learning curve and are forced to program close to the machine using special gpu programming languages. Software transactional memory for gpu architectures yunlong xu. Towards a software transactional memory for graphics processors. Evaluation of amds advanced synchronization facility within a complete transactional memory stack performance evaluation of intel transactional synchronization extensions for highperformance computing software transactional memory.

On the hardware side, kilo tm was proposed in 2011. Gpu localtm allocates transactional metadata in the existing memory resources, minimizing the storage requirements for tm support. A stm system that supports perthread transactions faces new challenges. To make applications with dynamic data sharing among threads benefit from gpu acceleration, we propose a novel software transactional memory system for gpu architectures gpustm. Tm transactional memory stm software transactional memory htm hardware transactional memory hytm hybrid transactional memory tsx intels transactional synchronization extensions hle hardware lock elision rtm restricted transactional memory gpu graphics processing unit gpgpu general purpose computation on graphics processing units cpu central. Hardware transactional memory for gpu architectures. Thesis, department of electrical and computer engineering, university of colorado.

To reduce this effort, prior work has proposed supporting transactional memory on gpu architectures. This dissertation aims to reduce the burden on gpu software developers with two major enhancements to gpu architectures. Improvements in hardware transactional memory for gpu. Toward a software transactional memory for heterogeneous. The heterogeneous accelerated processing units apus integrate a multicore cpu and a gpu within the same chip. Computing without processors august 2011 communications. And now having read about intels hw tm i have many curious questions. It is only accessible by the gpu and not accessible via the cpu. Tm simplifies software development for parallel architectures by providing the programmer with the illusion that code blocks, called transactions, execute. Improvements in hardware transactional memory for gpu architectures 3 proposed. To appear in the 12th annual ieeeacm international symposium on code generation and optimization cgo, 2014. Sep 15, 2008 3 the graphics memory is the gpu s version of host memory. Energy e ciency of software transactional memory in a. One hardware proposal, kilo tm, can scale to s of concurrent transaction.

Software transactional memory for gpu architectures ieee. An efficient software transactional memory using committime invalidation. Each kernel launch dispatches a hierarchy of threads a grid of blocks. Transactional synchronization extensions tsx, also called transactional synchronization extensions new instructions tsxni, is an extension to the x86 instruction set architecture isa that adds hardware transactional memory support, speeding up execution of multithreaded software through lock elision. Toward a software transactional memory for heterogeneous cpu. Or would these kinds of building blocks be just what we want. Hardware support for local memory transactions on gpu architectures alejandro villegas angeles navarro.

However, performance and energy overhead of kilo tm may deter gpu vendors from incorporating it into future designs. Qingda lu, christophe alias, uday bondhugula, sriram krishnamoorthy, j. Many tm systems have been proposed in the last two decades for multicore architectures 7, implemented either in hardware or software or a combination. Modern gpu architectures have a memory hierarchy that needs to be explicitly programmed to obtain good performance. Matt software transactional memory, herlihys hardware accelerator concept. For a set of tmenhanced gpu applications, kilo tm captures 59% of the performance of finegrained locking, and is on average 128x faster than executing all transactions serially, for an estimated hardware area overhead of 0. Aamodt university of british columbia, canada motivation. Transactional memory for heterogeneous systems arxiv.

Hardware support for scratchpad memory transactions on gpu. Pdf software transactional memory for gpu architectures. Yunlong xu, rui wang, nilanjan goswami, tao li and depei qian. Gpu computing architecture for irregular parallelism ubc. While transactional memory for processors with hundreds of cores is likely to require hardware support, software implementations will be required for backward compatibility with current and near. As the downside, software implementations usually come with a performance penalty, when compared to hardware. To make applications with dynamic data sharing among threads benefit from gpu acceleration, we propose a novel software transactional. Nilanjan goswami gpu architect advanced computing lab. His research interests include parallel programming, software transactional memory, and distributed architectures. Scheduling techniques for gpu architectures with processinginmemory capabilities ashutosh pattnaik1 xulong tang1 adwait jog2 onur kay. Gpustm, a software tm for gpus enables simplified data synchronizations on gpus scales to s of txs ensures livelockfreedom runs on commercially available gpus and runtime outperforms gpu coarsegrain locks by up to 20x. We propose gpu localtm, a hardware transactional memory tm, as an alternative to data locking mechanisms in local memory. Scheduling techniques for gpu architectures with processing.

Software transactional memory for gpu architectures. Compiler, architecture and tools conference program abstracts. View anup holeys profile on linkedin, the worlds largest professional community. The unconverted parts of the java program could use up the cpu multicore resources with its multithreaded workload. To improve gpus programmability and thus extend their usage to a wider range of applications, the authors propose to enable transactional memory tm on gpus. The major challenges include ensuring good scalability with respect to the massively multithreading of gpus, and preventing livelocks. Acle version acle q3 2019 acle acle q3 2019 documentation. Transactional memory tm is an optimistic approach to achieve this goal. The major challenges include ensuring good scalability with respect to the massively multithreading of gpus, and preventing livelocks caused by the simt execution paradigm of gpus. Software transactional memory for gpu architectures proceedings. Towards a software transactional memory for heterogeneous. Hardware transactional memory for gpu architectures wilson w. To evaluate tlll, we use it to implement six widely used programs, and compare it with the stateoftheart adhoc gpu synchronization, gpu software transactional memory stm, and cpu hardware.

Software transactional memory for gpu architectures ieee xplore. Advanced computer architecture and systems detailed. To make applications with dynamic data sharing among threads benefit from gpu acceleration, we propose a novel software transactional memory system for gpu architectures gpu stm. Ennals, efficient software transactional memory, technical report, intel research cambridge, uk, 2005. However, ensuring atomicity for complex data types is a task delegated to programmers. Software transactional memory for gpu architectures nilanjan. Hardware transactional memory for gpu architectures ubc ece. Sadayappan, yongjian chen, haibo lin and tinfook ngai.

1418 1301 683 766 373 1481 1393 805 292 532 409 263 826 1025 424 741 1069 904 391 161 858 1402 1002 206 1234 1018 318 1170 633 754 1137 495