![]() Thrust provides both CPU and GPU versions of the code in one code base, but it does so at a cost. The potential trade-off between portability and performance was demonstrated in this investigation. Additionally, we would like to consider new ways for optimizing both the Thrust and CUDA versions, in order to see how much performance we have yet to achieve. Future work will be required to research more effective ways of handling such tallies. We decided to remove the tally in order to focus on the effectiveness of the event-based algorithm. A zonal scalar flux tally requires atomic operations that significantly impacted the performance of the code, in some cases producing slowdowns instead of speedups. While investigating this problem, we also discovered that the performance of the event-based algorithm is affected by what tallies are being used. Additionally, we showed that on GPU platforms and at large enough problem sizes the event-based implementations perform more efficiently than the serial history-based implementation running on the host CPU. We found that an explicit CUDA implementation of an event-based Monte Carlo algorithm performed significantly more efficiently than a Thrust implementation on GPU platforms, most likely as a result of additional flexibility in access to different memory spaces on the GPU. We described preliminary investigations of portable event-based Monte Carlo algorithms implemented using the Nvidia Thrust library in a research Monte Carlo test code. The event-based algorithm targets many-core architectures by increasing SIMD (single instruction multiple data) parallelism, while Thrust potentially provides portable performance by allowing one source code base to compile code targeted for both CPUs and GPUs. In this paper, we describe initial research investigations of an event-based Monte Carlo transport algorithm implemented using the Nvidia Thrust library on a GPU for a Monte Carlo test code. In order to address these challenges, two important changes are typically required: a new algorithmic approach for solving Monte Carlo transport, and explicit use of hardware specific software. A significant challenge for Monte Carlo transport projects is to simultaneously support within a single source code base efficient simulations for both the current generation of architectures and the different advanced computing architectures. GPU architectures require additional code to explicitly use the hardware, requiring significant code changes or hardware specific branches in the source code. MIC architectures require vectorization to operate efficiently, more ยป and vectorization is difficult to achieve in Monte Carlo transport. Traditional approaches to Monte Carlo transport do not work efficiently on these new computing platforms. These different advanced architectures make the computing landscape in upcoming years complex. Lawrence Livermore National Laboratory's Sierra machine, available in 2018, will use an IBM PowerPC architecture along with Nvidia graphics processing unit (GPU) architecture accelerators. Los Alamos National Laboratory's Trinity machine, available in 2016, will use both Intel Xeon Haswell processors and Intel Xeon Phi Knights Landing many integrated core (MIC) architecture coprocessors. Power consumption considerations are driving future high performance computing platforms toward many-core computing architectures. Lastly, possible directions for future research are = , Noteworthy strides in power simulations for GPUs are included along with their performance or functional simulator counterparts when appropriate. ![]() Often building on the counter-based models, research efforts for GPU power simulation, which make power predictions from input code and hardware knowledge, provide opportunities for optimization in programming or architectural design. Developments and challenges of counter-based GPU power modeling is discussed. Statistical correlation between power and performance counters has yielded worthwhile GPU power models, yet the complexity inherent to GPU architectures presents new hurdles for power modeling. Hardware counters, which are low-level tallies of hardware events, share strong correlation to power use and performance. Moreover, as direct measurement of GPU power is necessary for model evaluation and parameter initiation, internal and external power sensors are discussed. Our work is a survey of GPU power modeling and profiling methods with increased detail on noteworthy efforts. Consequently, continued development relies on understanding their power consumption. Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high throughput applications.Though GPUs consume large amounts of power, their use for high throughput applications facilitate state-of-the-art energy efficiency and performance. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |