Index of Contributors

A

Afanasiev Michael MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:45-15:00

MS Presentation

Salvus: A Flexible Open-Source Package for Full-Waveform Modelling and Inversion, Michael Afanasiev (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Martin van Driel (ETH Zurich, Switzerland); Lion Krischer (Ludwig Maximilian University of Munich, Germany); Dave A. May (ETH Zurich, Switzerland); Max Rietmann (ETH Zurich, Switzerland); Korbinian Sager (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Within all domain-specific software projects, finding the correct balance between flexibility and performance is often difficult. In the seismic imaging community, the trend has been to move towards codes which are heavily optimised, but which often sacrifice usability and flexibility. Here we introduce Salvus: an open-source HPC high-order finite element (FE) package focused on full-waveform modelling and inversion, which is designed to be both flexible and performant. Salvus was constructed by following modern software design practices, testing protocols, and by establishing its foundations upon existing open-source high-level scientific libraries. The FE framework is generalized over spatial dimensions, time-integrators, polynomial order and wave-propagation physics, and provides support for both hexahedral and tetrahedral meshes. Additionally, support is included for various numerical optimisation methods. We discuss our usage of existing open-source scientific libraries, our choice level of abstraction, and quantitatively investigate the performance penalties associated with these abstractions.
Paper
Wednesday, June 8, 2016
Auditorium C, 13:00-13:30

Paper

Automatic Global Multiscale Seismic Inversion: Insights into Model, Data, and Workflow Management, Michael Afanasiev (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Alexey Gokhberg (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Modern global seismic waveform tomography is formulated as a PDE-constrained nonlinear optimization problem, where the optimization variables are Earth's visco-elastic parameters. This particular problem has several defining characteristics. First, the solution to the forward problem, which involves the numerical solution of the elastic wave equation over continental to global scales, is computationally expensive. Second, the determinedness of the inverse problem varies dramatically as a function of data coverage. This is chiefly due to the uneven distribution of earthquake sources and seismometers, which in turn results in an uneven sampling of the parameter space. Third, the seismic wavefield depends nonlinearly on the Earth's structure. Sections of a seismogram which are close in time may be sensitive to structure greatly separated in space.

In addition to these theoretical difficulties, the seismic imaging community faces additional issues which are common across HPC applications. These include the storage of massive checkpoint files, the recovery from generic system failures, and the management of complex workflows, among others. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these memory- and manpower-bounded issues.

We present a two-tiered solution to the above problems. To deal with the problems relating to computational expense, data coverage, and the increasing nonlinearity of waveform tomography with scale, we present the Collaborative Seismic Earth Model (CSEM). This model, and its associated framework, takes an open-source approach to global-scale seismic inversion. Instead of attempting to monolithically invert all available seismic data, the CSEM approach focuses on the inversion of specific geographic subregions, and then consistently integrates these subregions via a common computational framework. To deal with the workflow and storage issues, we present a suite of workflow management software, along with a custom designed optimization and data compression library. It is the goal of this paper to synthesize these above concepts, originally developed in isolation, into components of an automatic global-scale seismic inversion.
Aidun Cyrus MS Presentation
Friday, June 10, 2016
Garden 3C, 10:30-11:00

MS Presentation

Progress and Challenges in Multiscale Simulation of Cellular Blood Flow, Cyrus Aidun (Georgia Institute of Technology, United States of America)

Co-Authors:

Blood flow and the associated diseases in the cardiovascular system are fundamentally multiscale where events at the nano and micro-scale trigger problems in coronary and larger arteries as well as major arteries and heart valves. For example, atherothrombosis is a critical common event in heart attacks and strokes. The accumulation of platelets (1 to 2 micrometer) into a thrombus (few millimeters) has been shown to be dependent on hemodynamics, especially the local shear rate in arteries (2 to 10 millimeters). To induce arterial thrombosis, shear rate affects two biophysical factors: a) convection of circulating platelets to reach the thrombus surface (flux), and b) affinity of the surface to platelet binding. Any computational simulation must consider blood flow at the cellular level. We shall discuss progress and challenges in multiscale simulation of blood flow at cellular level with HPC as a fundamental platform for gaining insight in many cardiovascular processes.
Alfonso-Prieto Mercedes MS Presentation
Friday, June 10, 2016
Garden 3A, 10:45-11:00

MS Presentation

Simulations of Ion Channel Modulation by Lipids, Mercedes Alfonso-Prieto (University of Barcelona, Spain)

Co-Authors: Michael L. Klein (Temple University, United States of America)

Inward-rectifier K+ (Kir) channels are essential to maintain the resting membrane potential of neurons and to buffer extracellular potassium by glial cells. Indeed, Kir malfunction has been suggested to play a role in some neuropathologies, e.g. white matter disease, epilepsy and Parkinson's disease. Kir activation requires phosphatidylinositol-(4,5)-bisphosphate (PIP2), a highly negatively charged lipid located in the inner membrane leaflet. In addition, the presence of other non-specific, anionic phospholipids, such as phosphatidylserine (PS), is needed for the channel to be responsive at physiological concentrations of PIP2. The dual lipid modulation of Kir channels is not yet fully understood at a molecular level. To give further insight into how the two lipids act cooperatively to open the channel, we used all-atom molecular dynamics (MD) simulations. We found that initial binding of PS helps to pre-assemble the binding site of PIP2, which in turn completes Kir activation.
MS Summary

MS Summary

MS29 Molecular Neuromedicine: Recent Advances by Computer Simulation and Systems Biology, Mercedes Alfonso-Prieto (University of Barcelona, Spain)

Co-Authors: Giulia Rossetti (JSC and RWTH-UKA, Germany), Mercedes Alfonso-Prieto (University of Barcelona, Spain)

Innovative neuromedicine approaches require a detailed understanding of the molecular and systems-level causes of neurological diseases, their progression and the response to treatments. Ultimately, neuronal function and diseases are caused by exquisite molecular recognition processes during which specific biomolecules bind to each other allowing neuronal signaling, metabolism, synaptic transmission, etc. The detailed understanding of these processes, as well as the rational design of molecules for technology advances in neuropharmacology, require the characterization of neuronal biomolecules' structure, function, dynamics and energetics. The biomolecules and processes under study are inherently highly complex in terms of their size (typically on the order of 10^5-10^6 atoms) and time-scale (up to seconds), much longer than what can be simulated by standard molecular dynamics approaches (which, nowadays, can typically reach up to microseconds). This requires the development of methodologies in multiscale molecular simulation. Recent advancements include coarse-grained (CG) approaches that allow to study large systems on a long timescale, as well as very accurate hybrid methods combining QM modelling with molecular mechanics (MM) that provide descriptions of key neuronal photoreceptors, such as rhodopsin. In addition, Brownian dynamics are used to study biomolecular recognition and macromolecular assembly processes towards in vivo conditions. Such computational tools are invaluable for description, prediction and understanding of biological mechanisms in a quantitative and integrative way. This workshop will be an ideal forum to discuss both advancements and future directions in multiscale methodologies and applications on key signaling pathways in neurotransmission, such as those based on neuronal G-protein coupled receptors (GPCRs). These novel methodologies might prove to be instrumental to understand the underlying causes of brain diseases and to design new drugs aimed at their treatment.
Alimirzazadeh Siamak MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:45-15:00

MS Presentation

GPU-Accelerated Hydrodynamic Simulation of Hydraulic Turbines Using the Finite Volume Particle Method, Siamak Alimirzazadeh (EPFL, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL / LMH, Switzerland); Christian Vessaz (EPFL / LMH, Switzerland); Sebastian Leguizamon (EPFL / LMH, Switzerland); François Avellan (EPFL / LMH, Switzerland)

Performance prediction based on numerical simulations can be very helpful in the design process of hydraulic turbines. The Finite Volume Particle Method (FVPM) is a consistent and conservative particle-based method which inherits interesting features of both Smoothed Particle Hydrodynamics and grid-based Finite Volume Method. This method is particularly well-suited for such simulations thanks to its versatility. SPHEROS is a parallel FVPM solver which has been developed at the EPFL - Laboratory for Hydraulic Machines for simulating Pelton turbines and silt erosion. In order to allow the simulation of industrial-size setups, a GPU version of SPHEROS (GPU-SPHEROS) is being developed in CUDA and features Thrust library to handle complicated structures such as octree. Besides, some highly-optimised kernels are also implemented for both compute-bound and memory-bound algorithms. Comparing the performance of different parts of GPU-SPHEROS and SPHEROS, we achieve a speed-up factor of at least eight.
Ameres Jakob MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:30-16:45

MS Presentation

Particle in Fourier Discretization of Kinetic Equations, Jakob Ameres (Technische Universität München, Germany)

Co-Authors: Katharina Kormann (Max Planck Institute for Plasma Physics, Germany); Eric Sonnendrücker (Max Planck Society, Germany)

Particle methods are very popular for the discretization of kinetic equations, since they are embarrassingly parallel. In plasma physics the high dimensionality (6D) of the problems raises the costs of grid based codes, favouring the mesh free transport with particles. A standard Particle in Cell (PIC) scheme couples the particle density to a grid based field solver using finite elements. In this particle mesh coupling the stochastic error appears as noise, while the deterministic error leads to e.g. aliasing, inducing unphysical instabilities. Projecting the particles onto a spectral grid yields an energy and momentum conserving, almost sure aliasing free scheme, Particle in Fourier (PIF). For few electrostatic modes PIF has very little computational overhead, rendering it suitable for a fast implementation. We present 6D Vlasov-Poisson simulations of Landau damping and a Bump-on-Tail instability and compare the results as well as the computational performance to a grid based semi-Lagrangian solver.
MS Summary

MS Summary

MS11 HPC Implementations and Numerics for Kinetic Plasma Models, Jakob Ameres (Technische Universität München, Germany)

Co-Authors: Jakob Ameres (Technische Universität München, Germany)

The fundamental model in plasma physics is a kinetic description by a phase-space distribution function solving the Vlasov-Maxwell equation. Due to the complexity of the models, computer simulations are of key importance in understanding the behaviour of plasmas e.g. in fusion reactors and a wide range of codes exists in the plasma physics community and are run on large-scale computing facilities. However, kinetic simulations are very challenging due to the relatively high-dimensionality, the presence of multiple scales, and turbulences. For this reason, state-of-the-art plasma solvers mostly discretize simplified models like the gyrokinetic equations. Recent advances in computing power render it possible to approach the full six-dimensional system. The focus of the minisymposium is to bring together researchers developing modern numerical methods and optimized implementation of scalable algorithms for a future generation of plasma codes capable of simulating new physical aspects. Two types of methods are used in state-of-the-art Vlasov solvers: particle-based and grid-based methods. Especially for high-dimensional models, particle-based methods are many times preferred due to a better scaling with dimensionality. Even though particle-in-cell codes are embarrassingly parallel, care has to be taken in the layout of the memory structure in order to enable fast memory access on high-performance computers. On the other hand, grid-based methods are known to give accurate results for reduced Vlasov equations in two and four dimensional phase space. Domain partitioning strategies and scalable interpolation algorithms for semi-Lagrangian methods need to be developed. Mesh refinement can also be used to reduce the number of grid points. Macro-scale properties of the plasma can often be described by a fluid model. Spectral discretization methods have the attractive feature that they reduce the kinetic model to a number of moments - thus incorporating a fluid description of plasmas. A central aspect of this minisymposium will be the simulation of highly magnetized plasmas. This is the situation for fusion devices based on magnetic confinement fusion, like the ITER project. In this configuration, the particle exhibit a fast circular motion around the magnetic field lines, the so-called gyromotion. This motion gives rise to multiple scales since turbulences arise on a much slower time scale. Asymptotically preserving schemes can tackle the time scale of the gyromotion beyond the gyrokinetic model.
Anciaux Guillaume MS Presentation
Friday, June 10, 2016
Garden 2A, 09:30-09:55

MS Presentation

Concurrent Coupling of Particles with a Continuum for Dynamical Motion of Solids, Guillaume Anciaux (EPFL, Switzerland)

Co-Authors: J. F. Molinari (EPFL, Switzerland); Till Junge (Karlsruhe Institute of Technology, Germany); Jaehyun Cho (EPFL, Switzerland)

There are many situations where the discrete nature of matter needs to be accounted by numerical models. For instance with crystalline materials, friction and ductile fracture modelling can benefit from Molecular Dynamics formalism. However, capturing these processes needs sizes involving large number of particles, often becoming out of reach of modern computers. Thus, concurrent multiscale approaches have emerged to reduce the computational cost by using a coarser continuum model. The difference between particles and continuum leads to several challenging problems. In this presentation, finite temperatures, numerical stability and dislocation passing will be addressed. Also the software framework LibMultiScale will be presented with its associated parallel computation design choices.
MS Presentation
Friday, June 10, 2016
Garden 2A, 10:20-10:40

MS Presentation

Multiscale Modeling of Frank-Read Source, Guillaume Anciaux (EPFL, Switzerland)

Co-Authors: Guillaume Anciaux (EPFL, Switzerland)

The strength of materials is mainly controlled by dislocations. Their dynamics include nucleations and multiplications at grain boundaries. Nonlinear atomistic forces should be considered to quantify nucleations, while, to correctly model dislocation pile-up, forces in far-fields are required. Atomistic (MD) and discrete dislocation (DD) simulations have been performed to study these dynamics. In MD, nucleations were naturally considered, but limitations of domain sizes persisted. In DD, dislocation interactions including pile-up were well described. However, ad-hoc approaches are required for nucleations. These limitations lead to couple these two method: CADD3D. In this talk, we present a multiscale model of a Frank-Read source using CADD3D. Several dislocations will be nucleated from the Frank-Read source in the MD zone, and develop as complete closed loops. As they approach the DD domain, they will be transformed as DD dislocations. An observable consequence will be work-hardening effects due to the pile-up back stresses.
MS Summary

MS Summary

MS17 Applications and Algorithms for HPC in Solid Mechanics I: Plasticity, Guillaume Anciaux (EPFL, Switzerland)

Co-Authors: Guillaume Anciaux (EPFL, Switzerland)

Plastic deformation is made possible by the motion of interacting crystalline defects. In the study of the mechanisms controlling the macroscopic and effective plastic laws, it is of particular importance to understand the collective behaviour of dislocations. Available models generally represent dislocations with nodes connected with segments which can form a complex network structure. Within these forests, dislocations can nucleate, interact, join, annihilate. This forms an important challenge because of the many defaults present in the crystals such as impurities, grain boundaries and free surfaces. In order to capture the correct physics of the described processes, the employed (self-) interaction laws and the mobility laws have to be correctly acknowledged. Furthermore, the number of dislocation segments need to be important if one wants to achieve calculations comparable with experimental scales. In parallel, full atomistic simulations can provide insights into detailed mechanistic aspects of dislocation nucleation, transformation, and reactions that occur at the nanoscale, below the capabilities of mesoscale dislocation models. Consequently, the numerical calculations are challenging and call for HPC strategies. This minisymposium aims at fostering discussion on the newest advancements of the numerical models, accompanying algorithms, and applications. Also, we encourage researchers working in the field of dislocation plasticity to present analytic models and experimental results that complement studies performed with parallel algorithms.
MS Summary

MS Summary

MS25 Applications and Algorithms for HPC in Solid Mechanics II: Multiscale Modelling, Guillaume Anciaux (EPFL, Switzerland)

Co-Authors: Guillaume Anciaux (EPFL, Switzerland), J. F. Molinari (EPFL, Switzerland)

In all observable phenomena, a full understanding of the operative physical mechanisms leads to extremely complicated models. One natural analysis path is to decompose the problem into simpler sub-problems yet transferring part of the complexity to the issue of coupling the sub-problems. In the study of materials and solids, considerable progress has been made in the numerical methods to couple scales. For instance, atomic, discrete-defect, meso-scale, and structural scales can now be coupled together under various assumptions. In this minisymposium, talks are solicited to present new work on coupling strategies, their mathematical description, and/or their implementation details for possible HPC machines. Such multiscale methods could deal with multi-grid, FE², concurrent methods, particle-continuum coupling, among others. Other multiscale themes are also welcome in this minisymposium.
Andermatt Samuel Poster

Poster

MAT-06 Linear Scaling Ehrenfest Molecular Dynamics, Samuel Andermatt (ETH, Switzerland)

Co-Authors: Florian Schiffmann (Victoria University, Australia); Joost VandeVondele (ETH Zurich, Switzerland)

With the available computational power growing, ever larger systems can be investigated with increasingly advanced methods and new algorithms. For electronic structure calculations on systems containing a few thousand atoms, linear scaling algorithms are essential. For ground state DFT calculations, linear scaling has already been demonstrated for millions of atoms in the condensed phase [J. VandeVondele, U. Bortnik, J. Hutter, 2012]. Here, we extend this work to electronically excited states, for example, to make UV/VIS spectroscopy or investigations of the electron injection process in dye-sensitized solar cells possible. We base our approach on non-adiabatic molecular dynamics, in particular on Ehrenfest molecular dynamics (EMD). The formalism, based on the density matrix, allows for linear scaling based on the sparsity of the density matrix and naturally incorporates density embedding methods such as the Kim-Gordon approach.
Andreussi Oliviero Poster

Poster

MAT-03 Complex Wet-Environments in Electronic-Structure Calculations, Oliviero Andreussi (Institute of Computational Science, Universita' della Svizzera Italiana,Via Gius, Switzerland)

Co-Authors: Luigi Genovese (CEA/INAC, France); Oliviero Andreussi (Università della Svizzera italiana, Switzerland); Nicola Marzari (EPFL, Switzerland); Stefan Goedecker (University of Basel, Switzerland)

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions at the ab-initio level in the presence of the proper electrochemical environment. In this work we present a continuum solvation library able to handle both neutral and ionic solutions, solving the Generalized Poisson and the Poisson-Boltzmann problem. Two different recipes have been implemented to build up the continuum dielectric cavity (one using atomic coordinates, the other mapping the solute electronic density). A preconditioned conjugate gradient method has been implemented for the Generalized Poisson equation, whilst a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. Both solvers and continuum dielectric cavities have been integrated into the BigDFT electronic-structure package. We benchmarked the whole library on several atomistic systems including small neutral molecules, large proteins, solvated surfaces and reactions in solution to demonstrate efficiency and performances.
Angelikopoulos Panagiotis Paper
Wednesday, June 8, 2016
Auditorium C, 14:30-15:00

Paper

Approximate Bayesian Computation for Granular and Molecular Dynamics Simulations, Panagiotis Angelikopoulos (ETH Zurich, Switzerland)

Co-Authors: Panagiotis Angelikopoulos (ETH Zurich, Switzerland); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Costas Papadimitriou (University of Thessaly, Greece); Petros Koumoutsakos (ETH Zurich, Switzerland)

The effective integration of models with data through Bayesian uncertainty quantification hinges on the formulation of a suitable likelihood function. In many cases such a likelihood may not be readily available or it may be difficult to compute. The Approximate Bayesian Computation (ABC) proposes the formulation of a likelihood function through the comparison between low dimensional summary statistics of the model predictions and corresponding statistics on the data. In this work we report a computationally efficient approach to the Bayesian updating of Molecular Dynamics (MD) models through ABC using a variant of the Subset Simulation method. We demonstrate that ABC can also be used for Bayesian updating of models with an explicitly defined likelihood function, and compare ABC-SubSim implementation and efficiency with the transitional Markov chain Monte Carlo (TMCMC). ABC-SubSim is then used in force-field identification of MD simulations. Furthermore, we examine the concept of relative entropy minimization for the calibration of force fields and exploit it within ABC. Using different approximate posterior formulations, we showcase that assuming Gaussian ensemble fluctuations of molecular systems quantities of interest can potentially lead to erroneous parameter identification.
Armiento Rickard MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:20-15:40

MS Presentation

Designing New Materials with the High-Throughput Toolkit, Rickard Armiento (Linköping University, Sweden)

Co-Authors:

I give a brief overview of our recently launched software framework for large-scale database-driven high-throughput computation, with a few examples of recent applications. The High-throughput toolkit (httk) is a comprehensive open-source software library that enables preparation, execution, and analysis of ab initio simulations in high-throughput with a high level of automatization. The framework also comprises functionality for sharing cryptographically signed result sets with other researchers. This framework powers our online materials-genome-type repository of ab inito predicted materials properties at http://openmaterialsdb.se. Examples of recent use of httk are the generation of training data for a machine learning model used to predict formation energies of 2M Elpasolite (ABC2D6) crystals (arXiv:1508.05315 [physics.chem-ph]) and to analyse phase stability of a class of materials based on substitutions into AlN. Current work in progress includes extending the toolkit for our ongoing research into the use of big data methods for materials design.
Arsenlis Tom MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:00-14:30

MS Presentation

The Development of ParaDiS for HCP Crystals, Tom Arsenlis (Lawrence Livermore National Laboratory, United States of America)

Co-Authors: Sylvie Aubry (Lawrence Livermore National Laboratory, United States of America); Moono Rhee (Lawrence Livermore National Laboratory, United States of America); Brett Wayne (Lawrence Livermore National Laboratory, United States of America); Gregg Hommes (Lawrence Berkeley National Laboratory, United States of America)

The ParaDiS project at LLNL was created to build a scalable massively parallel code for the purpose of predicting evolution of strength and strain hardening and crystalline materials under dynamic loading conditions by directly integrating the elements of dislocation physics. The code has been used by researchers at LLNL and around the world to simulate the behaviour of dislocation networks in a wide variety of applications, from high temperature structural materials, to nuclear materials, to armor materials, to photovoltaic systems. ParaDiS has recently been extended to include a fast analytical algorithm for the computation of forces in anisotropic elastic media, and an augmented set of topological operations to treat the complex core physics of dislocations and other dislocations that routinely appear in HCP metals. The importance and implications of these developments on the engineering properties of HCP metals will be demonstrated in large scale simulations of strain hardening.
Arteaga Andrea MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:15-15:30

MS Presentation

Refactoring and Virtualizing a Mesoscale Model for GPUs, Andrea Arteaga (Federal Office of Meteorology and Climatology MeteoSwiss, Zürich, Switzerland)

Co-Authors: Andrea Arteaga (MeteoSwiss, Switzerland); Christophe Charpilloz (MeteoSwiss, Switzerland); Salvatore Di Girolamo (ETH Zurich, Switzerland); Torsten Hoefler (ETH Zurich, Switzerland)

Our aim is to adopt the COSMO limited-area model to enable kilometer-scale resolution in climate simulation mode. As the resolution of climate simulations increases, storing the large amount of generated data becomes infeasible. To enable high-resolution models, we find a good compromise between the disk I/O costs and the need to access the output data for post-processing and analysis. We propose a data-virtualization layer that re-runs simulations on demand and transparently manages the data for the analytics applications. To achieve this goal, we developed a bit-reproducible version of the dynamical core of the COSMO model that runs on different architectures (e.g., CPUs and GPUs). An ongoing project is working on the reproducibility of the full COSMO code. We will discuss the strategies adopted to develop the data virtualization layer, the challenges associated with the reproducibility of simulations performed on different hardware architectures and the first promising results of our project.
Aubert Dominique MS Presentation
Wednesday, June 8, 2016
Garden 3C, 13:00-13:30

MS Presentation

GPUs for Cosmological Simulations: Some Experiments with the ATON & EMMA Codes, Dominique Aubert (University of Strasbourg/CNRS, France)

Co-Authors:

For a few years, we have been developing in Strasbourg new cosmological simulation tools to study the reionization process. These tools participate to the strong ongoing effort by the community to include the radiative transfer physics in numerical simulations. Given its central role in the transition, one might want to deal routinely with light in the same way as we deal with matter in such codes. However, the inclusion of light is known to be quite challenging in terms of numerical resources and can increase by orders of magnitude the complexity or the numerical requirements of the existing codes. I will present how we were led to use graphics processing units (GPUs) as hardware accelerators to cope with this numerical challenge and I will discuss the methodologies of ATON, a radiative post-processing code and EMMA, a new complete AMR cosmological simulation code, both accelerated by means of graphics devices.
Aubry Sylvie MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:00-14:30

MS Presentation

The Development of ParaDiS for HCP Crystals, Sylvie Aubry (Lawrence Livermore National Laboratory, United States of America)

Co-Authors: Sylvie Aubry (Lawrence Livermore National Laboratory, United States of America); Moono Rhee (Lawrence Livermore National Laboratory, United States of America); Brett Wayne (Lawrence Livermore National Laboratory, United States of America); Gregg Hommes (Lawrence Berkeley National Laboratory, United States of America)

The ParaDiS project at LLNL was created to build a scalable massively parallel code for the purpose of predicting evolution of strength and strain hardening and crystalline materials under dynamic loading conditions by directly integrating the elements of dislocation physics. The code has been used by researchers at LLNL and around the world to simulate the behaviour of dislocation networks in a wide variety of applications, from high temperature structural materials, to nuclear materials, to armor materials, to photovoltaic systems. ParaDiS has recently been extended to include a fast analytical algorithm for the computation of forces in anisotropic elastic media, and an augmented set of topological operations to treat the complex core physics of dislocations and other dislocations that routinely appear in HCP metals. The importance and implications of these developments on the engineering properties of HCP metals will be demonstrated in large scale simulations of strain hardening.
Auricchio Angelo MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:30-14:00

MS Presentation

Accurate Estimation of 3D Ventricular Activation in Heart Failure Patients from Electroanatomic Mapping, Angelo Auricchio (Fondazione Cardiocentro Ticino, Switzerland)

Co-Authors: Peter Kalavsky (Università della Svizzera italiana, Switzerland); Mark Potse (Università della Svizzera italiana, Switzerland); Angelo Auricchio (Fondazione Cardiocentro Ticino, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland)

Accurate characterization of the cardiac activation sequence can support diagnosis and personalized therapy in heart-failure patients. Current invasive mapping techniques provide limited coverage. Our aim is to estimate a complete volumetric activation map from a limited number of measurements. This is achieved by optimising the local conductivity and early activation sites to minimise the mismatch between simulated and measured activation times. We modeled the activation times using an eikonal equation, reducing computational cost by 3 orders of magnitude compared to more physiologically detailed methods. The model provided a sufficiently accurate approximation of the activation time and the ECG in our patients. The solver was implemented on GPUs. Since the fast-marching method is not suitable for this architecture, we used a simple Jacobi iteration of a local variational principle. On a single GPU, each forward simulation took less than 2 seconds, and the inverse problem was solved in a few minutes.
Avellan François MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:45-15:00

MS Presentation

GPU-Accelerated Hydrodynamic Simulation of Hydraulic Turbines Using the Finite Volume Particle Method, François Avellan (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL / LMH, Switzerland); Christian Vessaz (EPFL / LMH, Switzerland); Sebastian Leguizamon (EPFL / LMH, Switzerland); François Avellan (EPFL / LMH, Switzerland)

Performance prediction based on numerical simulations can be very helpful in the design process of hydraulic turbines. The Finite Volume Particle Method (FVPM) is a consistent and conservative particle-based method which inherits interesting features of both Smoothed Particle Hydrodynamics and grid-based Finite Volume Method. This method is particularly well-suited for such simulations thanks to its versatility. SPHEROS is a parallel FVPM solver which has been developed at the EPFL - Laboratory for Hydraulic Machines for simulating Pelton turbines and silt erosion. In order to allow the simulation of industrial-size setups, a GPU version of SPHEROS (GPU-SPHEROS) is being developed in CUDA and features Thrust library to handle complicated structures such as octree. Besides, some highly-optimised kernels are also implemented for both compute-bound and memory-bound algorithms. Comparing the performance of different parts of GPU-SPHEROS and SPHEROS, we achieve a speed-up factor of at least eight.
MS Presentation
Thursday, June 9, 2016
Garden 3A, 11:50-12:10

MS Presentation

URANS Computations of an Unstable Cavitating Vortex Rope, François Avellan (EPFL-LMH, Switzerland)

Co-Authors: Andres Müller (EPFL, Switzerland); François Avellan (EPFL / LMH, Switzerland); Cécile Münch (HES-SO Valais-Wallis, Switzerland)

Due to the massive penetration of alternative renewable energies, hydraulic power plants are key energy conversion technologies to stabilize the electrical power network using hydraulic machines at off design operating conditions. For a flow rate larger than the one at the best efficient point a cavitating vortex rope occurs, leading to strong pressure surges in the entire hydraulic system. To better understand the mechanisms responsible for the pressure surges, URANS simulations of a reduced scale Francis turbine are performed. Several sigma values are investigated corresponding to stable and unstable cavitating vortex ropes. The results are compared with the experimental measurements. The main challenge of the computations is the long physical time, compared to the time step, required to capture the beginning of the instability.
Contributed Talk
Friday, June 10, 2016
Garden 1BC, 10:15-10:30

Contributed Talk

Thermomechanical Modeling of Impacting Particles on a Metallic Surface for the Erosion Prediction in Hydraulic Turbines, François Avellan (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL, Switzerland); Christian Vessaz (EPFL, Switzerland); François Avellan (EPFL, Switzerland)

Erosion damage in hydraulic turbines is a common problem caused by the high-velocity impact of small particles entrained in the fluid. Numerical simulations can be useful to investigate the effect of each governing parameter in this complex phenomenon. The Finite Volume Particle Method is used to simulate the three-dimensional impact of dozens of rigid spherical particles on a metallic surface. The very fine discretization and the overall number of time steps needed to achieve the steady state erosion rate render the problem very expensive, implying the need for high performance computing. In this talk, a comparison of constitutive models is presented, with the aim of assessing the complexity of the thermomechanical modelling required to accurately simulate the impact and subsequent erosion of metals. The importance of strain rate, triaxiality, friction model and thermal effects is discussed.
Awile Omar Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:45-15:00

Contributed Talk

A Thread-Parallel Implementation of High-Energy Physics Particle Tracking on Many-Core Hardware Platforms, Omar Awile (CERN, Switzerland)

Co-Authors: Pawel Szostek (CERN, Switzerland)

Tracking and identification of particles are amongst the most time-critical tasks in the computing farms of high-energy physics experiments. Many of the underlying algorithms were designed before the advent of multi-core processors and are therefore strictly serial. Since the introduction of multi-core platforms these workloads have been parallelized by executing multiple copies of the application. This approach does not optimally utilize modern hardware. We present two different thread-parallel implementations of a straight-line particle track finding algorithm. We compare our implementations, based on TBB and OpenMP, to the current production version in LHCb's Gaudi software framework as well as a state-of-the-art implementation for general purpose GPUs. Our initial analysis shows a speedup of more than 50% over multi-process Gaudi runs and competitive performance to the GPGPU version. This study allows us to better understand the impact of many-core hardware platforms supplementing traditional CPUs in the context of the upcoming LHCb upgrade.

B

Baaden Marc MS Presentation

, -

MS Presentation

Computer Simulations Provide Guidance for Molecular (Neuro)medicine, Marc Baaden (CNRS, France)

Co-Authors:

Computer simulations provide crucial insights and rationales for the design of molecular approaches in (neuro) medicine. I will describe three case studies to illustrate how molecular model building and molecular dynamics simulations of complex molecular assemblies such as membrane proteins help in that process [1-3]. Through selected examples - including key signaling pathways in neurotransmission - the links between a molecular-level understanding of biological mechanisms and original approaches to treat disease conditions will be illuminated. Such treatments may be symptomatic, e.g. by better understanding the function and pharmacology of macromolecular key players, or curative, e.g. through molecular inhibition of disease-inducing molecular processes. [1] http://dx.doi.org/10.1016/j.biochi.2015.03.018 [2] http://dx.doi.org/10.1039/C3FD00134B [3] B. Laurent, S. Murail, A. Shahsavar, L. Sauguet, M. Delarue, M. Baaden : "Sites of anesthetic inhibitory action on a cationic ligand-gated ion channel", Structure, 2016 (in press)
MS Presentation
Friday, June 10, 2016
Garden 3A, 09:30-10:00

MS Presentation

Computer Simulations Provide Guidance for Molecular (Neuro)medicine, Marc Baaden (CNRS, France)

Co-Authors:

Computer simulations provide crucial insights and rationales for the design of molecular approaches in (neuro)medicine. I will describe three case studies to illustrate how molecular model building and molecular dynamics simulations of complex molecular assemblies such as membrane proteins help in that process [1-3]. Through selected examples, including key signaling pathways in neurotransmission, the links between a molecular-level understanding of biological mechanisms and original approaches to treat disease conditions will be illuminated. Such treatments may be symptomatic, e.g. by better understanding the function and pharmacology of macromolecular key players, or curative, e.g. through molecular inhibition of disease-inducing molecular processes. [1] http://dx.doi.org/10.1016/j.biochi.2015.03.018 [2] http://dx.doi.org/10.1039/C3FD00134B [3] B. Laurent, S. Murail, A. Shahsavar, L. Sauguet, M. Delarue, M. Baaden : "Sites of anesthetic inhibitory action on a cationic ligand-gated ion channel", Structure, 2016 (in press)
Bader Michael MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:30-14:45

MS Presentation

Using Generated Matrix Kernels for a High-Order ADER-DG Engine, Michael Bader (Leibniz Supercomputing Centre, Technische Universitaet Muenchen, Germany)

Co-Authors: Vasco Varduhn (Technische Universität München, Germany); Michael Bader (Leibniz Supercomputing Centre, Germany)

The ExaHyPE project employs the high-order discontinuous Galerkin finite element method in order to solve hyperbolic PDEs on adaptive Cartesian grids at exascale level. Envisaged applications include grand-challenge simulations in astrophysics and geosciences. Our compute kernels rely on tensor operations - a type of operation scientific computing libraries only support to a limited degree. We demonstrate concepts of how the tensor operations can be reduced to dense matrix-matrix multiplications, which is undoubtedly one of the best optimised operations in linear algebra. We apply reordering and reshaping techniques, which enables our code generator to exploit existing highly optimised libraries as back end and produce highly optimised compute kernels. As a result, our tool chain provides a "complete solution" for tensor product-based FEM 'operations'.
MS Summary

MS Summary

MS26 Bridging Scales in Geosciences, Michael Bader (Leibniz Supercomputing Centre, Technische Universitaet Muenchen, Germany)

Co-Authors: Dave A. May (ETH Zurich, Switzerland), Michael Bader (Leibniz Supercomputing Centre, Technische Universitaet Muenchen, Germany)

Complex, but relevant processes within the Solid Earth domain cover a wide range of space and time scales, up to 17 and 26 orders of magnitude, respectively. Earthquake propagation, for instance, depends on dynamic processes at the rupture tip over 10^-9 seconds, while the plate tectonic faults on which they occur evolve over time scales up to 100's of millions of years. While problems in imaging and modelling of mantle processes on the Earth's tens of thousands of kilometers scale can be affected by physio-chemical compositions varying on a meter scale and being determined on a molecular level. Besides these examples ample of other physical processes in geophysics cross the largest imaginable scales. At each of the characteristic scales different physical processes are relevant, which thus requires us to couple the relevant physics at the different scales. Simulating the physics at each of these scales is a tremendous task, which hence often requires High Performance Computing. Computational challenges include, but are not limited to, a large number of degrees of freedom and crossing the two-scale problem on which most computational tools are founded. To discuss and start to tackle these challenges we aim to bring together computer and geoscientists to discuss them from different perspectives. Applications within geosciences, include, but or not limited to, geodynamics, seismology, fluid dynamics, tectonics, geomagnetism, and exploration geophysics.
Balint Daniel MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:20-15:40

MS Presentation

Multiscale Modelling of Dwell Fatigue in Polycrystalline Titanium Alloys, Daniel Balint (Imperial College London, United Kingdom)

Co-Authors: Daniel Balint (Imperial College London, United Kingdom); Fionn Dunne (Imperial College London, United Kingdom)

Titanium alloys are used for manufacturing highly stressed components of gas turbine engine such as discs and blades due to their low density, excellent corrosion resistance and high fatigue strength. However, it has been reported that these alloys exhibit a significant fatigue life reduction, called dwell debit, under cyclic loading that includes a hold at the peak stress. In this study, a rate-dependent crystal plasticity framework was first used to reproduce the experimentally observed macroscopic response of Ti624x (x = 2 and 6) alloys under low-cycle fatigue and low-cycle dwell fatigue loading, which enabled relevant material constants for the two alloys to be determined. These were then utilized in a discrete dislocation plasticity model using the same thermally activated rate controlling mechanism to examine the dwell behaviour of the alloys.
Balmaseda Magdalena MS Presentation
Wednesday, June 8, 2016
Garden 3B, 16:30-17:00

MS Presentation

Scalability and Performance of the NEMOVAR Variational Ocean Data Assimilation Software, Magdalena Balmaseda (ECMWF, United Kingdom)

Co-Authors: Anthony Weaver (CERFACS, France); Magdalena Balmaseda (ECMWF, United Kingdom); Kristian Mogensen (ECMWF, United Kingdom)

Scalability and performance of the variational data assimilation software NEMOVAR for the NEMO ocean model is presented. NEMOVAR is a key component of the ECMWF operational Ocean analysis System 4 (Ocean S4) and future System 5 (Ocean S5). It is designed as a four dimensional variational assimilation (4D-Var) algorithm, which can also support three-dimensional (3D-Var) assimilation, using the First-Guess at Appropriate Time (FGAT) approach. Central to the code performance is the implementation of the correlation operator used for modelling of the background error covariance matrix. In NEMOVAR it is achieved using a diffusion operator. A new implicit formulation of the diffusion operator has been introduced recently which solves the underlying linear system using the Chebyshev iteration. The technique is more flexible and better suited for massively parallel machines than the method currently used operationally at ECMWF, but further improvements will be necessary for the future high-resolution applications.
Ban Nikolina Poster

Poster

CLI-02 Climate Change Simulations at Kilometer-Scale Resolution, Nikolina Ban (Atmospheric and Climate Science, ETH Zurich, Switzerland, Switzerland)

Co-Authors: Juerg Schmidli (Goethe University of Frankfurt, Germany); Christoph Schär (ETH Zurich, Switzerland)

The recent increase of the computational power enables running of climate simulations at the kilometer-scale resolution. This modeling approach is able to explicitly resolve deep convection (i.e., thunderstorms and rain showers), and thus allows reducing some of the key uncertainties in current climate models. Here we present analyses of decade-long climate change simulations at horizontal resolution of 2.2km across a greater Alpine region. The simulations have been conducted on a Cray XE6 system using a setup with 2000 cores on a computation mesh of 500x500x60 grid points. The results show great improvement in the simulation of summer precipitation, and demonstrate the importance of kilometer-scale resolution in climate change projections.
MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, Nikolina Ban (Atmospheric and Climate Science, ETH Zurich, Switzerland, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.
Bani-Hashemian Mohammad Hossein Poster

Poster

MAT-01 A Generalized Poisson Solver for First-Principles Device Simulations, Mohammad Hossein Bani-Hashemian (ETH Zurich, Switzerland)

Co-Authors: Sascha Brück (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

We present a Poisson solver with main applications in ab-initio simulations of nanoelectronic devices. The solver employs a plane-wave (Fourier) based pseudospectral approach and is capable of solving the generalized Poisson equation with a position-dependent dielectric constant subject to periodic or homogeneous Neumann conditions on the boundaries of the simulation cell and Dirichlet type conditions imposed at arbitrary subdomains. Any sufficiently smooth function modelling the dielectric constant, including density dependent dielectric continuum models can be utilized. Furthermore, for all the boundary conditions, consistent derivatives are available allowing for energy conserving molecular dynamics simulations.
Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:15-09:30

Contributed Talk

Ab-Initio Quantum Transport Simulation of Nano-Devices, Mohammad Hossein Bani-Hashemian (ETH Zurich, Switzerland)

Co-Authors: Mauro Calderara (ETH Zurich, Switzerland); Mohammad Hossein Bani-Hashemian (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland)

To simulate advanced electronic devices such as nanoscale transistors or memory cells whose functionality may depend on the position of single atoms only, a quantum transport solver is needed, which should not only be capable of atomic scale resolution, but also to deal with systems consisting of thousands to a hundred thousands atoms. The device simulator OMEN and the electronic structure code CP2K have been united to perform ab initio quantum transport calculations on the level of density functional theory. To take full advantage of modern hybrid supercomputer architectures, new algorithms have been developed and implemented. They allow for the simultaneous computation of open boundary conditions in parallel on the available CPUs and the solution of the Schrödinger equation in a scalable way on the GPUs. The main concepts behind the algorithms will be presented and results for realistic nanostructures will be shown.
Barnas Martina Paper
Thursday, June 9, 2016
Auditorium C, 12:00-12:30

Paper

Context Matters: Distributed Graph Algorithms and Runtime Systems, Martina Barnas (Indiana University, United States of America)

Co-Authors: Jesun Sahariar Firoz (Indiana University, United States of America); Thejaka Amila Kanewala (Indiana University, United States of America); Marcin Zalewski (Indiana University, United States of America); Martina Barnas (Indiana University, United States of America)

The increasing complexity of the software/hardware stack of modern supercomputers makes understanding the performance of the modern massive-scale codes difficult. Distributed graph algorithms (DGAs) are at the forefront of that complexity, pushing the envelope with their massive irregularity and data dependency. We analyse the existing body of research on DGAs to assess how technical contributions are linked to experimental performance results in the field. We distinguish algorithm-level contributions related to graph problems from "runtime-level" concerns related to communication, scheduling, and other low-level features necessary to make distributed algorithms work. We show that the runtime is an integral part of DGAs' experimental results, but it is often ignored by the authors in favor of algorithm-level contributions. We argue that a DGA can only be fully understood as a combination of these two aspects and that detailed reporting of runtime details must become an integral part of scientific standard in the field if results are to be truly understandable and interpretable. Based on our analysis of the field, we provide a template for reporting the runtime details of DGA results, and we further motivate the importance of these details by discussing in detail how seemingly minor runtime changes can make or break a DGA.
Bartok Albert P. Poster

Poster

MAT-02 A Periodic Table of Molecules, Albert P. Bartok (University of Cambridge, United Kingdom)

Co-Authors: Albert P. Bartok (University of Cambridge, United Kingdom); Gabor Csanyi (University of Cambridge, United Kingdom); Michele Ceriotti (EPFL, Switzerland)

A graphical-representation of a database[1] of more than 7000 molecules containing non-homogeneous mix of C, N, O and S atoms has been generated using a non-linear dimensionality reduction algorithm (sketch-map[2]) so that molecules with similar composition and geometry are projected close to each other. The underlying metric is based on the REMatch-SOAP kernel[3], which is built upon a comparison of local environments[4], and treats the "alchemical" similarity between molecules, and the location of atoms in space, on the same footings. Much like in a periodic table, one can also recognize strong correlations between the positions of the molecules on the map and their physical-chemical properties, from formation energy to polarizability. References: [1] G. Montavon et al., NJP, 2013, 15, 095003; [2] M. Ceriotti, et al., JCTC, 2013, 9, 1521-1532; [3] S. De, et al., PCCP 2016, DOI: 10.1039/C6CP00415F; [4] A. P. Bartok et al., PRB, 2013, 87, 184115.
Basermann Achim MS Presentation
Thursday, June 9, 2016
Garden 2BC, 11:00-11:30

MS Presentation

Highly Scalable Sparse Eigensolvers for Large Quantum Physics Problems on Heterogeneous Computing Systems, Achim Basermann (German Aerospace Center (DLR), Germany)

Co-Authors:

In the German Research Foundation project ESSEX (Equipping Sparse Solvers for Exascale), we develop scalable sparse eigensolver libraries for large quantum physics problems. Partners in ESSEX are the Universities of Erlangen, Greifswald, Wuppertal, Tokyo and Tsukuba as well as DLR. The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method and basic kernel library. Within ESSEX the numerical methods cover widely applicable solvers such as classic Krylov, Jacobi-Davidson or recent FEAST methods and domain specific iterative schemes relevant for the ESSEX quantum physics applications. Using the ESSEX software framework, we present recent work on sparse eigenvalue solvers for heterogeneous computing systems with a focus on the difficult problem of finding interior eigenvalues. In this context, we also discuss the application of the CARP-CG method as a preconditioner for applications such as Graphene simulation.
Bauer Andreas MS Presentation
Friday, June 10, 2016
Garden 3A, 10:00-10:15

MS Presentation

Ligand Binding to the Human Adenosine Receptor hA 2A R in Nearly Physiological Conditions, Andreas Bauer (Forschungszentrum Jülich, Germany)

Co-Authors: Ruyin Cao (Forschungszentrum Jülich, Germany); Andreas Bauer (Forschungszentrum Jülich, Germany); Paolo Carloni (Forschungszentrum Jülich, Germany)

Lipid composition may significantly affect membrane proteins function, yet its impact on the protein structural determinants is not well understood. Here we present a comparative molecular dynamics (MD) study of the human adenosine receptor type 2A (hA2AR) in complex with caffeine "a system of high neuro-pharmacological relevance" within different membrane types: POPC, mixed POPC/POPE and cholesterol-rich membranes. 0.8-μs MD simulations unambiguously show that the helical folding of the amphipathic helix 8 depends on membrane contents. Most importantly, the distinct cholesterol binding into the cleft between helix 1 and 2 stabilizes a specific caffeine-binding pose against others visited during the simulation. Hence, cholesterol presence (approximately 33%-50% in synaptic membrane in central nervous system), often neglected in X-ray determination of membrane proteins, affects the population of the ligand binding poses. We conclude that including a correct description of neuronal membranes may be very important for computer-aided design of ligands targeting hA2AR and possibly other GPCRs.
Bauer Peter MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:30-14:45

MS Presentation

Towards Exascale Computing with the ECMWF Model, Peter Bauer (ECMWF, United Kingdom)

Co-Authors: Nils Wedi (ECMWF, United Kingdom); George Mozdzynski (ECMWF, United Kingdom); Sami Saarinen (ECMWF, United Kingdom)

The European Centre for Medium-Range Weather Forecasts (ECMWF) is currently investing in a scalability programme that addresses computing and data handling challenges for realizing those scientific advances on future high-performance computing environments that will enhance predictive skill from medium to monthly time scales. A key component of this programme is the European Commission funded project Energy efficient SCalable Algorithms for weather Prediction at Exascale (ESCAPE) that develops numerical building blocks and compute intensive algorithms of the forecast model, applies compute/energy efficiency diagnostics, designs implementations on novel architectures, and performs testing in operational configurations. The talk will report on the progress of the scalability programme with a special focus on ESCAPE.
MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:30-15:45

MS Presentation

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System, Peter Bauer (ECMWF, United Kingdom)

Co-Authors: Tiago Quintino (ECMWF, United Kingdom); Baudouin Raoult (ECMWF, United Kingdom); Simon Smart (ECMWF, United Kingdom); Stephan Siemen (ECMWF, United Kingdom); Peter Bauer (ECMWF, United Kingdom)

As the resolution of the forecasts produced by ECMWF's Integrated Forecast System (IFS) is refined, the amount of data involved continues its geometric growth. Current peak loads already require an otherwise oversized parallel storage filesystem (Lustre). The data volume is expected to grow 6-fold by 2020, to reach 120TB/day, concentrated in short 1 hour bursts. Moreover, this data requires post-processing to create the final forecast products sent to end-users, introducing a further I/O bottleneck. Realizing these challenges, ECMWF's Scalability Programme aims to redesign the data workflow to minimize I/O in the time-critical path, whilst retaining resilience to failures. The authors are investigating multiple solutions to tackle issues of data locality, data volume and overall resilience. Solutions range from a novel NVRAM hardware co-design effort inside the EU funded NEXTGenIO project, to the use of distributed object storage technologies and a new dynamic worker-broker solution for managing the post-processing workload.
Bauer Simon Poster

Poster

CSM-02 A Novel Approach for Efficient Stencil Assembly in Curved Geometries, Simon Bauer (LMU Munich, Germany)

Co-Authors: Marcus Mohr (Ludwig Maximilian University of Munich, Germany); Ulrich Rüde (University of Erlangen-Nuremberg, Germany); Markus Wittmann (FAU Erlangen-Nürnberg / Erlangen Regional Computing Center (RRZE), Germany); Barbara Wohlmuth (Technical University of Munich, Germany)

In many scientific and engineering applications one has to deal with curved geometries. Such domains can accurately be approximated e.g., by unstructured grids and iso-parametric finite elements. We present a novel approach here that is well-suited to our concept of hierarchical hybrid grids (HHG). The latter was shown to achieve excellent performance and scalability even for extreme numbers of DOFs by a matrix-free implementation and exploiting regularity of access patterns. In our approach FE stencils are not assembled exactly, but approximated by low order polynomials and evaluated with an efficient incremental algorithm. We demonstrate the accuracy achieved as well as the computational efficiency using our prototypical HHG-based mantle convection solver which operates on non-nested triangulations of a thick spherical shell. The implementation of our scheme is based on a systematic node-level performance analysis and maintains the high efficiency of the original HHG.
Beckmann Andreas MS Presentation
Thursday, June 9, 2016
Garden 2BC, 11:30-12:00

MS Presentation

How to Do Nothing in Less Time, Andreas Beckmann (Juelich Supercomputing Centre, Germany)

Co-Authors: David Haensel (Juelich Supercomputing Centre, Germany); Andreas Beckmann (Juelich Supercomputing Centre, Germany)

The Fast Multipole Method is a generic toolbox algorithm for many important scientific applications, e.g. molecular dynamics. It enables us to compute all long-range O(N^2) pairwise interactions for N particles in O(N) for any given precision. Unfortunately, the runtime of such simulations is already communication bound. To increase the performance on modern HPC hardware, a more sophisticated parallelization scheme is required. Especially the reduction of MPI collectives is a vital issue to increase the strong scaling. In this talk we will focus exclusively on the internode communication via MPI. We will present a latency-avoiding communication scheme and its implementation for our C++11 FMM toolbox. The implementation consists of several layers of abstraction to hide/encapsulate low level MPI calls and specifics of the communication algorithm. We will also show examples of the scaling capabilities of the FMM on a BG/Q for small and medium size MD problems.
Becsek Barna Errol Mario MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:00-14:15

MS Presentation

FD/FEM Coupling with the Immersed Boundary Method for the Simulation of Aortic Heart Valves, Barna Errol Mario Becsek (ARTORG Center for Biomedical Engineering, University of Bern, Switzerland)

Co-Authors: Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Hadi Zolfaghari (University of Bern, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

The ever-increasing available computational power allows for solving more complex physical problems spanning multiple physical domains. We present a numerical tool for simulating fluid-structure interaction between blood flow and the soft tissue of heart valves. Using the basic concept of the Immersed Boundary Method, the interaction between the two physical domains (flow and structure) does not require mesh manipulation. We solve the governing equations of the fluid and the structure with domain-specific finite difference and finite element discretisations, respectively. We use a massively parallel algorithmic framework for handling the L2-projection transfer between the loosely coupled highly parallel solvers for fluid and solid. Our tool builds on a well-established and proven Navier-Stokes solver and a novel method for solving non-linear continuum solid mechanics.
Poster

Poster

LS-03 GPU-Accelerated Immersed Boundary Method with CUDA for the Efficient Simulation of Biomedical Fluid-Structure Interaction, Barna Errol Mario Becsek (ARTORG Center for Biomedical Engineering, University of Bern, Switzerland)

Co-Authors: Barna Errol Mario Becsek (University of Bern, Switzerland); Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

Immersed boundary methods have become the most usable and useful tools for simulation of biomedical fluid-structure interaction, e.g., in the aortic valve of human heart. In such problems, complex geometry and motion of the soft tissue impose significant computational cost for bodyfitted-mesh methods. Resorting to a fixed Eulerian grid for the flow simulation along with the immersed boundary method to model the interaction with the soft tissue eliminates the expensive mesh generation and updating costs. Nevertheless, the computational cost for the geometry operations including adaptive search algorithms are still significant. Herein, we implemented the immersed boundary kernels with CUDA to be transferred and executed on thousands of parallel threads on the general purpose GPU. Host-device memory optimisation along with optimal usage of GPU multiprocessors results in a boosted performance in fluid-structure interaction simulations.
Bekas Costas MS Presentation
Thursday, June 9, 2016
Garden 3B, 10:30-11:00

MS Presentation

Merging the Big Data and HPC Universes: Lessons to be Learned, Costas Bekas (IBM Research - Zurich, Switzerland)

Co-Authors:

Big Data is introducing a number of inescapable trends. Firstly, we are creating data at an exponential rate. Second, we need to use complicated algorithms to unlock the potential of the data. This combination results into the realisation that Big Data workloads are fast crossing into the realm of HPC. Challenges, that include how to merge the Big Data and HPC computing paradigms, how to virtualise HPC so as the serve the vast numbers of Big Data programmers and users and how to keep up with performance expectations in a post Moore's law landscape are clearly hot topics. We will discuss these issues and give a glimpse of what we think are ways forward.
MS Summary

MS Summary

MS15 HPC for the Big Data Era, Costas Bekas (IBM Research - Zurich, Switzerland)

Co-Authors:

Big Data is rapidly expanding into the HPC domain. This is driven by computational needs of Big Data applications that render these characteristic candidates for HPC. At the same time, the ever increasing fidelity of models and simulations is changing traditional HPC applications to data intensive workloads as well. These trends pose immensely interesting new challenges that span all the way from low level H/W design to programming environments and algorithms. How do we reconcile the need for massive I/O while bringing the computation as much as closer to the data. How do we merge the HPC and Big Data run-time and programming environments. What needs of cloud like virtualization can be answered by HPC knowledge and vice-versa. And finally, how do visualize immense amounts of data in meaningful ways. This minisymposium will attempt to introduce these trends and provide an overview of the landscape as well as a few directions for emerging designs, especially in view of exascale systems.
Benedicic Lucas MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:30-14:00

MS Presentation

Translating Python into GridTools: Prototyping PDE Solvers Using Stencils, Lucas Benedicic (Swiss National Supercomputing Centre, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

The fast-paced environment of high-performance computing architectures has always been a challenge for complex codes. First, the effort to adapt the code to new processor architectures is significant compared to their typical release phase. Second, optimisations for one target often incur performance penalties on others. Third, such codes are generally developed by domain scientists, which typically lack the expertise about specific details of the target platform. Successful projects like STELLA have shown that a way out of this situation is to apply the concept of separation of concerns. GridTools is pushing this concept even further: The domain scientist's work is conducted within a prototyping environment using a domain-specific language (DSL), while the computer scientist profiles the automatically-generated code over diverse architectures, implemented by different hardware-specific backends. This talk will give an overview of the GridTools ecosystem, highlighting the use of the prototyping environment in combination with the automatic-code generation engine.
Bercea Gheorghe-Teodor MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, Gheorghe-Teodor Bercea (Imperial College London, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Berczik Peter MS Presentation
Thursday, June 9, 2016
Garden 2A, 12:00-12:15

MS Presentation

HPC Simulations of Complex Free-Surface Flow Problems with SPH-Flow Software, Peter Berczik (Heidelberg University, Germany)

Co-Authors: Matthieu De Leffe (Nextflow Software, France); David Guibert (Nextflow Software, France); Pierre Bigay (Nextflow Software, France)

SPH-Flow is a multi-purpose, multi-physics CFD software based on the SPH method (smoothed particle hydrodynamics). It is developed by Ecole Centrale Nantes and Nextflow Software. The solver was first developed for fluid flow simulations dedicated to complex non-linear free surface problems and has then been extended to multi-fluid, multi-structure, fluid/structure interaction and viscous flows. It relies on state of the art meshless algorithms. SPH-Flow solver is parallelized with a distributed memory parallelization relying on a domain decomposition. Today, the applicative computations usually performed involve 64 to 4000 processors, depending on the problem. New industrial problems can now be solved with this method and its efficient HPC implementation. This talk will address the description of the SPH-Flow solver and parallelization. Massively parallel complex innovative simulations will then be discussed: Tire aquaplaning simulations, Wave impacts on a ship, Fluid flow in a car gear box and River-crossing simulations of a car.
MS Presentation
Thursday, June 9, 2016
Garden 3C, 15:45-16:00

MS Presentation

Towards a Multi Million GPU Cores for Exascale Astrophysical High Order Direct N-Body Simulations, Peter Berczik (Heidelberg University, Germany)

Co-Authors:

We present the set of high accuracy direct N-body numerical simulations (up to N=6M particles) with the large scale different mass galaxy-galaxy collisions. We use our own developed high performance and high order (4th - 6th - 8th) parallel individual timestep Hermite direct N-body code (phi-GPU) with the maximum possible numerical resolution. For the simulation we use the largest astrophysical GPU clusters (including the TITAN GPU supercomputer in the US which holds second place in top 500 list). The code uses all the possible multi scale parallelizations (large scale MPI, local multi-thread CPU with OpenMP and local multi-thread GPU with CUDA). The central Black Holes (BH) are simulated as a special particles with the Post Newtonian force corrections (up to 1/c^7 terms) implemented for the BH versus BH interactions, which allow us follow the BH's gravitational waves mergers up to the final seconds.
Bernhardt Oliver Martin MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 17:00-17:15

MS Presentation

Working with Limited Resources: Large-Scale Proteomic Data-Analysis on Cheap Gaming Desktop PCs, Oliver Martin Bernhardt (Biognosys AG, Switzerland)

Co-Authors: Lukas Reiter (Biognosys AG, Switzerland); Tejas Gandhi (Biognosys AG, Switzerland); Roland Bruderer (Biognosys AG, Switzerland)

One of the major challenges in mass-spec driven proteomics research is data-analysis. Many research facilities have the capacity to generate several gigabytes of data per hour. To process such data though, software solutions for high-throughput data analysis often require a cluster computing infrastructure. Since many research facilities do not have the required IT infrastructure for large-scale data-processing, this kind of proteomics research was restricted to only a few proteomics groups. Here we present a software solution that is capable of processing terabytes of large proteomics experiments on a cheap desktop gaming PC setup. We will focus on how to overcome the issue of limited resources while still maintaining a high-throughput data-analysis and reasonable scalability.
Poster

Poster

LS-02 Generating Very Large Spectral Libraries for Targeted Proteomics Analysis Using Spectronaut, Oliver Martin Bernhardt (Biognosys AG, Switzerland)

Co-Authors: Roland Bruderer (Biognosys AG, Switzerland); Oliver Martin Bernhardt (Biognosys AG, Switzerland); Lukas Reiter (Biognosys AG, Switzerland)

Mass spectrometer (MS) based data-independent acquisition with targeted analysis offers new possibilities for highly multiplexed peptide and protein quantification experiments. This type of analysis often includes a spectral library as a prerequisite. In layman's terms, a spectral library is a collection of fingerprints that facilitates the identification of signals measured by the MS. Both the size and the quality of a spectral library, acting as a template for these target signals, can make a significant difference in the quality of the data analysis. Recently, the trend has been moving towards generating very large spectral libraries consisting of hundreds of thousands of peptides stemming from tens of thousands of proteins. From a software engineering perspective, the challenge then is to process and manage such large libraries in an efficient manner. Here we present our solution towards generating very large spectral libraries while using a standard gaming workstation.
Bertoglio Cristóbal MS Presentation
Wednesday, June 8, 2016
Garden 3A, 15:30-16:00

MS Presentation

Hybridizable Discontinuous Galerkin Approximation of Cardiac Electrophysiology, Cristóbal Bertoglio (Center for Mathematical Modeling, Universidad de Chile, Chile)

Co-Authors: Cristóbal Bertoglio (Center for Mathematical Modeling, University of Chile, Chile); Martin Kronbichler (Technical University of Munich, Germany); Wolfgang A. Wall (Technical University of Munich, Germany)

Cardiac electrophysiology simulations are numerically extremely challenging, due to the propagation of the very steep electrochemical wave front during depolarization. Hence, in classical continuous Galerkin (CG) approaches, very small temporal and spacial discretisations are necessary to obtain physiological propagation. Until now, spatial discretisations based on discontinuous methods have received little attention for cardiac electrophysiology simulations. In particular, local discontinuous Galerkin (LDG) or hybridizable discontinuous Galerkin (HDG) methods have not been explored yet. Application of such methods, when taking advantage of their parallelity features, would allow a speed-up of the computations. In this work we provide a detailed comparison among CG, LDG and HDG methods for electrophysiology equations based on the mono-domain model. We also study the effect of the numerical integration of the non-linear ionic current term. Furthermore we plan to show the difference between classic CG methods and HDG methods on large three-dimensional simulations with patient-specific cardiac geometries.
Beseda Martin Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:30-11:50

Contributed Talk

The Energy Consumption Optimization of the FETI Solver, Martin Beseda (IT4Innovations National Supercomputing Center, VSB-Technical University of Ostra, Czech Republic)

Co-Authors: Lubomir Riha (IT4Innovations National Supercomputing Center, Czech Republic); Radim Sojka (IT4Innovations National Supercomputing Center, Czech Republic); Jakub Kruzik (IT4Innovations National Supercomputing Center, Czech Republic); Martin Beseda (IT4Innovations National Supercomputing Center, Czech Republic)

The presentation deals with the energy consumption evaluation of the FETI method blending iterative and direct solvers in the scope of READEX project. The measured characteristics on model cube benchmark illustrate the behaviour of preprocessing and solve phases related mainly to the CPU frequency, different problem decompositions, compiler's type and compiler's parameters. In preprocessing it is necessary to factorize the stiffness and coarse problem matrices, which belongs to the most time and also energy consuming operations. The solve employs the conjugate gradient algorithm and consists of sparse matrix-vector multiplications and vector dot products or AXPY functions. In each iteration we need to apply direct solver twice for pseudo-inverse action and coarse problem solution. All these operations cover together the basic Sparse and Dense BLAS Level 1, 2 and 3 routines, so that we can explore their different dynamism and dynamic switching between various configurations can then provide significant energy savings.
Bianco Mauro MS Presentation
Thursday, June 9, 2016
Auditorium C, 15:30-16:00

MS Presentation

Portability of Performance: The Cases of Kokkos and GridTools, Mauro Bianco (ETH Zurich / CSCS, Switzerland)

Co-Authors: Carter Edwards (Sandia National Laboratories, United States of America)

HPC libraries have provided abstractions for common and performance critical operations for decades. When uniform memory architectures were predominant, the main focus of library implementers was algorithm implementation, while data structures and layout had a secondary role. As memory architectures become more diverse, it became necessary to adjust simultaneously algorithmic needs and memory characteristics. Several recent library approaches tackle this problem, especially as performance portability is now essential. In this talk we will describe two libraries that address these issues: Kokkos and GridTools. Kokkos provides two fundamental abstractions: one for dispatching work for parallel execution and one for managing multidimensional arrays with polymorphic layouts. GridTools' main abstraction allows a programmer to describe complex stencil applications in an architecture agnostic way. Both libraries use the template mechanisms in C++ for high flexibility, thus avoiding source-to-source translators and proprietary annotations.
MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:30-14:00

MS Presentation

Translating Python into GridTools: Prototyping PDE Solvers Using Stencils, Mauro Bianco (ETH Zurich / CSCS, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

The fast-paced environment of high-performance computing architectures has always been a challenge for complex codes. First, the effort to adapt the code to new processor architectures is significant compared to their typical release phase. Second, optimisations for one target often incur performance penalties on others. Third, such codes are generally developed by domain scientists, which typically lack the expertise about specific details of the target platform. Successful projects like STELLA have shown that a way out of this situation is to apply the concept of separation of concerns. GridTools is pushing this concept even further: The domain scientist's work is conducted within a prototyping environment using a domain-specific language (DSL), while the computer scientist profiles the automatically-generated code over diverse architectures, implemented by different hardware-specific backends. This talk will give an overview of the GridTools ecosystem, highlighting the use of the prototyping environment in combination with the automatic-code generation engine.
Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 12:10-12:30

Contributed Talk

The GridTools Libraries for the Solution of PDEs Using Stencils, Mauro Bianco (ETH Zurich / CSCS, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

Numerical weather prediction and climate models like COSMO and ICON solve explicitly a large set of PDEs. The STELLA library was successfully used to port the dynamical core of COSMO providing a performance portable code across multiple platforms. A significant performance speedup was obtained for NVIDIA GPUs as reported in doi>10.1145/2807591.2807676. However its applicability was restricted to only cartesian structured grids, finite difference methods, and is difficult to be used outside the COSMO model. The GridTools project emerges as an effort to provide an ecosystem for developing portable and efficient grid applications for the explicit solution of PDEs. GridTools generalizes STELLA to a wider class of weather and climate models on multiple grids: Cartesian and spherical, and offers facilities for performing communication and setting boundary conditions. Here we present the GridTools API and show performance on NVIDIA GPUs and x86 platforms.
Bigay Pierre MS Presentation
Thursday, June 9, 2016
Garden 2A, 12:00-12:15

MS Presentation

HPC Simulations of Complex Free-Surface Flow Problems with SPH-Flow Software, Pierre Bigay (Nextflow Software, France)

Co-Authors: Matthieu De Leffe (Nextflow Software, France); David Guibert (Nextflow Software, France); Pierre Bigay (Nextflow Software, France)

SPH-Flow is a multi-purpose, multi-physics CFD software based on the SPH method (smoothed particle hydrodynamics). It is developed by Ecole Centrale Nantes and Nextflow Software. The solver was first developed for fluid flow simulations dedicated to complex non-linear free surface problems and has then been extended to multi-fluid, multi-structure, fluid/structure interaction and viscous flows. It relies on state of the art meshless algorithms. SPH-Flow solver is parallelized with a distributed memory parallelization relying on a domain decomposition. Today, the applicative computations usually performed involve 64 to 4000 processors, depending on the problem. New industrial problems can now be solved with this method and its efficient HPC implementation. This talk will address the description of the SPH-Flow solver and parallelization. Massively parallel complex innovative simulations will then be discussed: Tire aquaplaning simulations, Wave impacts on a ship, Fluid flow in a car gear box and River-crossing simulations of a car.
Bigot Julien Paper
Thursday, June 9, 2016
Auditorium C, 11:00-11:30

Paper

Benefits of SMT and of Parallel Transpose Algorithm for the Large-Scale GYSELA Application, Julien Bigot (CEA, France)

Co-Authors: Julien Bigot (CEA, France); Nicolas Bouzat (INRIA, France); Judit Gimenez (BSC, Spain); Virginie Grandgirard (CEA, France)

This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The semi-Lagrangian code Gysela typically performs large physics simulations using a few thousands of cores, between 1k cores up to 16k on x86-based clusters. However, simulations with finer resolutions and with kinetic electrons increase those needs by a huge factor, providing a good example of applications requiring Exascale machines. To improve Gysela compute times, we take advantage of efficient SMT implementations available on recent INTEL architectures. We also analyze the cost of a transposition communication scheme that involves a large number of cores in our case. Adaptation of the code for balance load whenever using both SMT and good deployment strategy led to a significant reduction that can be up to 38% of the execution times.
Bilionis Ilias MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:15-15:30

MS Presentation

Solving High-Dimensional Dynamic Stochastic Economies with Active Subspaces and Gaussian Processes, Ilias Bilionis (Purdue University, United States of America)

Co-Authors: Ilias Bilionis (Purdue University, United States of America)

We show how active subspace methods, in conjunction with Gaussian processes and parallel computing can be used to approximate equilibria in heterogeneous agent macro-models. Using recent advances in approximation theory, we apply a combination of Gaussian processes and active subspace techniques to approximate policy functions in models with at least 100 continuous states. Moreover, we show that our method is perfectly suited for dynamic programming and time iteration on non-cubic geometries such as simplices.
Blanchard Pierre MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:00-15:20

MS Presentation

An Efficient Interpolation Based Fast Multipole Method for Dislocation Dynamics Simulations, Pierre Blanchard (Inria, France)

Co-Authors: Arnaud Etcheverry (INRIA, France); Olivier Coulaud (INRIA, France); Laurent Dupuy (CEA, France); Eric Darve (Stanford University, United States of America)

Although the framework of Dislocation Dynamics (DD) provides powerful tools to model crystal plasticity, their efficient implementation is crucial in order to simulate very large ensembles of dislocations. Among all the steps involved in DD simulations, the computation of the internal elastic forces and energy are the most resource consuming. However, since they are long-ranged interactions, they can be efficiently evaluated using the Fast Multipole Method (FMM). We propose a new FMM formulation based on polynomial interpolations, that is optimised to reduce the memory footprint and the number of flops using fast Fourier transforms. These optimisations are necessary because of the tensorial nature of the kernel and the unusual memory requirements of this application. Regarding parallelism, our code benefits from a hybrid OpenMP/MPI paradigm and a cache-aware data structure. Numerical results will be presented to show the accuracy of this new approach and its parallel scalability.
Poster

Poster

CSM-08 Fast Randomized Algorithms for Covariance Matrix Computations, Pierre Blanchard (Inria, France)

Co-Authors: Olivier Coulaud (INRIA, France); Eric Darve (Stanford University, United States of America); Alain Franc (INRIA, France)

Covariance matrices arise in many fields of modern scientific computation from geostatistics to data analysis, where they usually measure the correlation between grid points. Most algorithms involving such matrices have a superlinear cost in N, the size of the grid. We present an open-source library implementing efficient algorithms based on randomized low-rank approximations (LRA). The library can provide approximate decompositions of low-rank covariance matrices in O(N^2) operations instead of the usual O(N^3) cost of standard methods. In addition, low-rank covariance matrices given as kernels, e.g., Gaussian functions, and evaluated on 3D grids can be factorized in O(N) operations using randomized LRA and an FMM acceleration. The performance of the library is illustrated on two examples: the generation of Gaussian Random Fields on large O(10^6) points heterogeneous spatial grids, and the computation of reduced-order maps given distances between DNA sequences using Multi-Dimensional Scaling for the classification of species on 10^5 samples.
Bleuler Andreas Poster

Poster

CSM-09 Hash Tables on GPUs Using Lock-Free Linked Lists, Andreas Bleuler (University of Zurich, Switzerland)

Co-Authors: Andreas Bleuler (University of Zurich, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Romain Teyssier (University of Zurich, Switzerland)

Hash table implementations which resolve collisions by chaining with linked lists are very flexible with respect to the insertion of additional keys to an existing table and to the deletion of a part of the keys from it. For our implementation on GPUs, we use non-blocking linked lists based on atomic "compare and swap" operations. The deletion of list entries is done by declaring them as invalid and removing them. Typically, after a couple of deletion operations, our local heap is compacted. Using this approach, the initial build of the hash table and hash lookups perform comparably to the CUDPP library implementation. However, small modifications of the table are performed much faster in our implementation than the complete rebuild required by other implementations. We intend to use this novel hash table implementation for astrophysical GPU simulations with adaptive mesh particle-in-cell, which would benefit greatly from these new features.
Blum Volker MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 13:00-13:20

MS Presentation

Scalable Algorithms for All-Electron Electronic Structure Theory, Volker Blum (Duke University, United States of America)

Co-Authors:

This talk outlines strategies to advance the scalability of electronic structure algorithms intended for all-electron electronic structure theory. In the context of Kohn-Sham density functional theory, the key scalability bottleneck is the eigenvalue problem, addressed by the ELPA eigenvalue solver library and a new, broader "Electronic Structure Infrastructure" (ELSI) interface to scalable but general approaches to construct directly the density matrix with reduced prefactors and/or scaling exponents. We also address a scalable, localized resolution of identity based implementation of hybrid functionals that offers O(N) scalability for calculations of approximately 1,000 atoms system size. Finally, we show first steps towards a load-balanced CPU-GPU all-electron electronic structure implementation in the framework of the FHI-aims all-electron code.
Boehm Christian MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:45-15:00

MS Presentation

Salvus: A Flexible Open-Source Package for Full-Waveform Modelling and Inversion, Christian Boehm (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Martin van Driel (ETH Zurich, Switzerland); Lion Krischer (Ludwig Maximilian University of Munich, Germany); Dave A. May (ETH Zurich, Switzerland); Max Rietmann (ETH Zurich, Switzerland); Korbinian Sager (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Within all domain-specific software projects, finding the correct balance between flexibility and performance is often difficult. In the seismic imaging community, the trend has been to move towards codes which are heavily optimised, but which often sacrifice usability and flexibility. Here we introduce Salvus: an open-source HPC high-order finite element (FE) package focused on full-waveform modelling and inversion, which is designed to be both flexible and performant. Salvus was constructed by following modern software design practices, testing protocols, and by establishing its foundations upon existing open-source high-level scientific libraries. The FE framework is generalized over spatial dimensions, time-integrators, polynomial order and wave-propagation physics, and provides support for both hexahedral and tetrahedral meshes. Additionally, support is included for various numerical optimisation methods. We discuss our usage of existing open-source scientific libraries, our choice level of abstraction, and quantitatively investigate the performance penalties associated with these abstractions.
Paper
Wednesday, June 8, 2016
Auditorium C, 13:00-13:30

Paper

Automatic Global Multiscale Seismic Inversion: Insights into Model, Data, and Workflow Management, Christian Boehm (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Alexey Gokhberg (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Modern global seismic waveform tomography is formulated as a PDE-constrained nonlinear optimization problem, where the optimization variables are Earth's visco-elastic parameters. This particular problem has several defining characteristics. First, the solution to the forward problem, which involves the numerical solution of the elastic wave equation over continental to global scales, is computationally expensive. Second, the determinedness of the inverse problem varies dramatically as a function of data coverage. This is chiefly due to the uneven distribution of earthquake sources and seismometers, which in turn results in an uneven sampling of the parameter space. Third, the seismic wavefield depends nonlinearly on the Earth's structure. Sections of a seismogram which are close in time may be sensitive to structure greatly separated in space.

In addition to these theoretical difficulties, the seismic imaging community faces additional issues which are common across HPC applications. These include the storage of massive checkpoint files, the recovery from generic system failures, and the management of complex workflows, among others. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these memory- and manpower-bounded issues.

We present a two-tiered solution to the above problems. To deal with the problems relating to computational expense, data coverage, and the increasing nonlinearity of waveform tomography with scale, we present the Collaborative Seismic Earth Model (CSEM). This model, and its associated framework, takes an open-source approach to global-scale seismic inversion. Instead of attempting to monolithically invert all available seismic data, the CSEM approach focuses on the inversion of specific geographic subregions, and then consistently integrates these subregions via a common computational framework. To deal with the workflow and storage issues, we present a suite of workflow management software, along with a custom designed optimization and data compression library. It is the goal of this paper to synthesize these above concepts, originally developed in isolation, into components of an automatic global-scale seismic inversion.
Bolleman Jerven Poster

Poster

LS-08 The UniProt SPARQL Endpoint: 21 Billion Triples in Production, Jerven Bolleman (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Sebastien Gehant (Swiss Institute of Bioinformatics, Switzerland); Thierry Lombardot (Swiss Institute of Bioinformatics, Switzerland); Lydie Bougueleret (Swiss Institute of Bioinformatics, Switzerland); Ioannis Xenarios (Swiss Institute of Bioinformatics, Switzerland); Nicole Redaschi (Swiss Institute of Bioinformatics, Switzerland)

The UniProt knowledgebase is a leading resource of protein sequences and functional information whose centerpiece is the expert-curated Swiss-Prot section. UniProt data is accessible at www.uniprot.org (via a user-friendly interface and a REST API) and at sparql.uniprot.org, a public SPARQL endpoint hosted and maintained by the Vital-IT and Swiss-Prot groups of SIB. With 21 billion RDF triples it is the largest free to use graph database in the sciences. SPARQL allows scientists to perform complex queries within UniProt and across datasets located on remote SPARQL endpoints. It provides a free data integration solution for users who cannot afford to create custom data warehouses, at a cost for the service providers. Here we discuss the challenges in maintaining the UniProt SPARQL endpoint, which is updated monthly in sync with the UniProt data releases.
Bolten Matthias MS Presentation
Thursday, June 9, 2016
Garden 2BC, 10:30-11:00

MS Presentation

Automatic Code Generation for Multigrid Methods on Structured Meshes, Matthias Bolten (Universität Kassel, Germany)

Co-Authors:

The solution of partial differential equations (PDEs) is needed in many applications in computational science and engineering. For an important class of PDEs multigrid methods are optimal solvers, nevertheless the choice of the right components can be difficult, especially in a parallel environment, as is the parallel implementation on up-to-date supercomputers with millions of cores, accelerators etc. Within the project ExaStencils we work on the automatic generation of multigrid methods to achieve higher productivity as well as performance portability through the automatic choice of different components of multigrid methods and the associated parameters optimised for a given hardware architecture. The core of ExaStencils is a hierarchy of domain specific languages that provide different levels of abstraction. The languages address the needs of application scientists, mathematicians, and computer scientists and allow for different optimisations. In this talk an overview over the current state of the project will be provided.
MS Summary

MS Summary

MS13 Development, Adaption, and Implementation of Numerical Methods for Exascale, Matthias Bolten (Universität Kassel, Germany)

Co-Authors: Harald Köstler (University of Erlangen-Nuremberg, Germany)

Nowadays supercomputers with millions of cores, specialized communication networks and in many cases accelerators pose big challenges for the development of scientific applications. In most cases, the efficiency of these applications largely dominated by one or a few numerical methods. As a consequence these underlying methods have to be able to use the available machine efficiently. The Priority Programme 1648 SPPEXA funded by the German Research Foundation (DFG) aims at tackling software challenges that arise on the way to exascale. One important part of these challenges are extremely scalable methods. SPPEXA's first year of its second three-year funding period started in January. In this minisymposium a subset of the project specifically addressing the challenges in numerical methods will present their approaches towards exascale numerics. The topics include fault tolerance, latency avoiding and automatic code generation, as well as strategies to achieve the extreme scalability that will be needed in the future.
Borsche Raul MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:00-13:30

MS Presentation

Junction-Generalized Riemann Problem for Stiff Hyperbolic Balance Laws in Networks of Blood Vessels, Raul Borsche (Technische Universitat Kaiserslautern, Germany)

Co-Authors: Eleuterio F. Toro (University of Trento, Italy); Gino I. Montecinos (Universidad de Chile, Chile); Raul Borsche (Technische Universität Kaiserslautern, Germany); Jochen Kall (Technische Universität Kaiserslautern, Germany)

We design a new implicit solver for the Junction-Generalized Riemann Problem (J-GRP), which is based on a recently proposed implicit method for solving the Generalized Riemann Problem (GRP) for systems of hyperbolic balance laws. We use the new J-GRP solver to construct an ADER scheme that is globally explicit, locally implicit and with no theoretical accuracy barrier, in both space and time. The resulting ADER scheme is able to deal with stiff source terms and can be applied to non-linear systems of hyperbolic balance laws in domains consisting on networks of one-dimensional sub-domains. Here we specifically apply the numerical techniques to networks of blood vessels. An application to a physical test problem consisting of a network of 37 compliant silicon tubes (arteries) and 21 junctions, reveals that it is imperative to use high-order methods at junctions, in order to preserve the desired high-order of accuracy in the full computational domain.
Bougueleret Lydie Poster

Poster

LS-08 The UniProt SPARQL Endpoint: 21 Billion Triples in Production, Lydie Bougueleret (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Sebastien Gehant (Swiss Institute of Bioinformatics, Switzerland); Thierry Lombardot (Swiss Institute of Bioinformatics, Switzerland); Lydie Bougueleret (Swiss Institute of Bioinformatics, Switzerland); Ioannis Xenarios (Swiss Institute of Bioinformatics, Switzerland); Nicole Redaschi (Swiss Institute of Bioinformatics, Switzerland)

The UniProt knowledgebase is a leading resource of protein sequences and functional information whose centerpiece is the expert-curated Swiss-Prot section. UniProt data is accessible at www.uniprot.org (via a user-friendly interface and a REST API) and at sparql.uniprot.org, a public SPARQL endpoint hosted and maintained by the Vital-IT and Swiss-Prot groups of SIB. With 21 billion RDF triples it is the largest free to use graph database in the sciences. SPARQL allows scientists to perform complex queries within UniProt and across datasets located on remote SPARQL endpoints. It provides a free data integration solution for users who cannot afford to create custom data warehouses, at a cost for the service providers. Here we discuss the challenges in maintaining the UniProt SPARQL endpoint, which is updated monthly in sync with the UniProt data releases.
Bouzat Nicolas Paper
Thursday, June 9, 2016
Auditorium C, 11:00-11:30

Paper

Benefits of SMT and of Parallel Transpose Algorithm for the Large-Scale GYSELA Application, Nicolas Bouzat (INRIA, France)

Co-Authors: Julien Bigot (CEA, France); Nicolas Bouzat (INRIA, France); Judit Gimenez (BSC, Spain); Virginie Grandgirard (CEA, France)

This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The semi-Lagrangian code Gysela typically performs large physics simulations using a few thousands of cores, between 1k cores up to 16k on x86-based clusters. However, simulations with finer resolutions and with kinetic electrons increase those needs by a huge factor, providing a good example of applications requiring Exascale machines. To improve Gysela compute times, we take advantage of efficient SMT implementations available on recent INTEL architectures. We also analyze the cost of a transposition communication scheme that involves a large number of cores in our case. Adaptation of the code for balance load whenever using both SMT and good deployment strategy led to a significant reduction that can be up to 38% of the execution times.
Broggini Filippo MS Presentation
Friday, June 10, 2016
Garden 1A, 10:30-10:45

MS Presentation

Dynamically Linking Seismic Wave Propagation at Different Scales, Filippo Broggini (ETH Zurich, Switzerland)

Co-Authors: Marlies Vasmel (ETH Zurich, Switzerland); Dirk-Jan van Manen (ETH Zurich, Switzerland); Johan Robertsson (ETH Zurich, Switzerland)

Numerical modelling of seismic wave propagation can be of great value at many scales, ranging from shallow applications in engineering geophysics to global scale seismology. Accurate modelling of the physics of wave propagation at different scales requires different spatial and temporal discretization and potentially also different numerical methods. We present a new method to dynamically link the waves propagating at these different scales. A finite-difference solver is used on a local grid, whereas the (much) larger background domain is represented by its (precomputed) Green's functions. At each time step of the simulation, the interaction between the events leaving the local domain and the medium outside is calculated using a Kirchhoff-type integral extrapolation and the extrapolated wavefield is applied as a boundary condition to the local domain. This results in a numerically exact hybrid modelling scheme, also after local updates of the model parameters.
MS Summary

MS Summary

MS23 Open Source Software (OSS) and High Performance Computing (HPC), Filippo Broggini (ETH Zurich, Switzerland)

Co-Authors: Johan Robertsson (ETH Zurich, Switzerland)

Open Source Software (OSS) plays a fundamental role in research-driven projects and, for this reason, it cannot be neglected by academia and industry. OSS is radically transforming how software is being developed by various scientific communities and it is likely to be central to future research activities in many more fields. The process of development has to reach beyond organizational boundaries to unleash new potentials and open paths to new collaborations. OSS scientific applications are required to solve complex and data-intensive research problems. These applications range from smaller scale simulations developed on a desktop machine to large, parallel simulations of the physical world using High Performance Computing (HPC) systems. The minisymposium is focused on identifying specific aspects of Open Source Software for the development of scientific software that exploits High Performance Computing (HPC) architectures. This class of OSS applications includes software developed to perform, for example, modelling of wave propagation in the Earth and real-time visualization of great volumes of data. This minisymposium will bring researchers from various environments together to exchange experience, findings, and ideas in the realm of Open Source Software. The speakers will demonstrate a practical working success related to OSS and HPC and present future directions for where we need to go.
Bruderer Roland MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 17:00-17:15

MS Presentation

Working with Limited Resources: Large-Scale Proteomic Data-Analysis on Cheap Gaming Desktop PCs, Roland Bruderer (Biognosys AG, Switzerland)

Co-Authors: Lukas Reiter (Biognosys AG, Switzerland); Tejas Gandhi (Biognosys AG, Switzerland); Roland Bruderer (Biognosys AG, Switzerland)

One of the major challenges in mass-spec driven proteomics research is data-analysis. Many research facilities have the capacity to generate several gigabytes of data per hour. To process such data though, software solutions for high-throughput data analysis often require a cluster computing infrastructure. Since many research facilities do not have the required IT infrastructure for large-scale data-processing, this kind of proteomics research was restricted to only a few proteomics groups. Here we present a software solution that is capable of processing terabytes of large proteomics experiments on a cheap desktop gaming PC setup. We will focus on how to overcome the issue of limited resources while still maintaining a high-throughput data-analysis and reasonable scalability.
Poster

Poster

LS-02 Generating Very Large Spectral Libraries for Targeted Proteomics Analysis Using Spectronaut, Roland Bruderer (Biognosys AG, Switzerland)

Co-Authors: Roland Bruderer (Biognosys AG, Switzerland); Oliver Martin Bernhardt (Biognosys AG, Switzerland); Lukas Reiter (Biognosys AG, Switzerland)

Mass spectrometer (MS) based data-independent acquisition with targeted analysis offers new possibilities for highly multiplexed peptide and protein quantification experiments. This type of analysis often includes a spectral library as a prerequisite. In layman's terms, a spectral library is a collection of fingerprints that facilitates the identification of signals measured by the MS. Both the size and the quality of a spectral library, acting as a template for these target signals, can make a significant difference in the quality of the data analysis. Recently, the trend has been moving towards generating very large spectral libraries consisting of hundreds of thousands of peptides stemming from tens of thousands of proteins. From a software engineering perspective, the challenge then is to process and manage such large libraries in an efficient manner. Here we present our solution towards generating very large spectral libraries while using a standard gaming workstation.
Brumm Johannes MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:00-15:15

MS Presentation

Computing Equilibria in Dynamic Stochastic Macro-Models with Heterogeneous Agents, Johannes Brumm (University of Zurich, Switzerland)

Co-Authors: Felix Kubler (University of Zurich, Switzerland); Simon Scheidegger (University of Zurich & Stanford University, Switzerland)

We show how sparse grid interpolation methods in conjunction with parallel computing can be used to approximate equilibria in overlapping generation (OLG) models with aggregate uncertainty. In such models, the state of the economy can be characterized by the wealth distribution across generations/cohorts of the population. To approximate the function mapping this state into agents' investment decisions and market prices, we use piece-wise multi-linear hierarchical basis functions on (adaptive) sparse grids. When solving for the recursive equilibrium function, we combine the adaptive sparse grid with a time iteration procedure resulting in an algorithm that is massively parallelisable. Our implementation is hybrid-parallel and can solve OLG models with large (depreciation) shocks and with 60 continuous state variables.
MS Summary

MS Summary

MS18 Computational Economics, Johannes Brumm (University of Zurich, Switzerland)

Co-Authors: Johannes Brumm (University of Zurich, Switzerland)

This minisymposium provides an overview of recent developments in how computational methods are applied to economic problems. These include, for instance, the inference of causality relations from large data-sets. Another example and focus of this symposium is the solution, estimation, and uncertainty quantification of dynamic stochastic economic models in fields like optimal taxation, asset pricing, or climate change. Solving such models is particularly challenging because of the feedback from the future that the expectations of the modeled economic agents create. This feature combined with the substantial heterogeneity that successful models of economic phenomena have to incorporate often results in dynamical systems with high-dimensional state spaces, confronting economists with the curse of dimensionality. Methods to alleviate this curse include adaptive sparse grids and active subspace methods. Moreover, such problems often require substantial computation time even if an efficient solution method is applied. Fortunately, the generic structure of many of these problems allows for massive parallelization and is thus a perfect application of modern high-performance computing techniques. This minisymposium brings together recent developments along those lines.
Brunner Gilles Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 10:30-10:50

Contributed Talk

Space-Time Parallelism for Hyperbolic PDEs, Gilles Brunner (EPFL, Switzerland)

Co-Authors: Gilles Brunner (EPFL, Switzerland); Jan Hesthaven (EPFL, Switzerland)

Parallel-in-time integration techniques have been hailed as a potential paths to exascale for the solution of evolution type problems. Methods of time-parallel integration are intended to extend parallel scaling on compute clusters beyond what is possible using conventional domain decomposition techniques alone. In this talk we give a short introduction to space-time parallelism with emphasis on the parareal method. We then proceed to present resent advances in the construction of the coarse operator needed in the iterative correction scheme. The modifications allow for parallel-in-time acceleration of purely hyperbolic systems of partial differential equations, something previously widely considered impractical. The talk is concluded with a presentation of preliminary results on parallel-in-time integration of a two-dimensional shallow-water-wave equation that governs the underlying dynamics in a tsunami simulation application.
Brunner Stephan Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Stephan Brunner (EPFL, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Stephan Brunner (EPFL, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.
MS Summary

MS Summary

MS02 Advanced Computing in Plasma, Particle and Astrophysics on Emerging HPC Architectures, Stephan Brunner (EPFL, Switzerland)

Co-Authors: Thomas Quinn (University of Washington, United States of America)

Non-traditional computing architectures such as general purpose graphics processing units (GPGPUs) and many-integrated core accelerators are providing leading edge performance for advanced scientific computing. Their advantages are particularly evident when the power cost is considered: The top 10 "green500", where flops/watt is the ranking criteria, are all heterogeneous machines. Given that power costs are becoming more critical, future capability machines are likely to be dominated by these architectures. Non-traditional architectures may require non-traditional programming models, and the scientific community is still learning how to take full advantage of heterogeneous machines with reasonable programming effort. This difficulty is compounded by the need for sophisticated algorithms to handle the large dynamic ranges encountered in state-of-the-art physics and astrophysics simulations. This minisymposium provides a forum for researchers in the computational plasma, particle physics and astrophysics communities to share their techniques and findings. The presentation and discussion of findings and lessons learned will foment more effective use of these new resources for the advancement of physics and astrophysics.
Brück Sascha Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:15-09:30

Contributed Talk

Ab-Initio Quantum Transport Simulation of Nano-Devices, Sascha Brück (ETH Zurich, Switzerland)

Co-Authors: Mauro Calderara (ETH Zurich, Switzerland); Mohammad Hossein Bani-Hashemian (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland)

To simulate advanced electronic devices such as nanoscale transistors or memory cells whose functionality may depend on the position of single atoms only, a quantum transport solver is needed, which should not only be capable of atomic scale resolution, but also to deal with systems consisting of thousands to a hundred thousands atoms. The device simulator OMEN and the electronic structure code CP2K have been united to perform ab initio quantum transport calculations on the level of density functional theory. To take full advantage of modern hybrid supercomputer architectures, new algorithms have been developed and implemented. They allow for the simultaneous computation of open boundary conditions in parallel on the available CPUs and the solution of the Schrödinger equation in a scalable way on the GPUs. The main concepts behind the algorithms will be presented and results for realistic nanostructures will be shown.
Poster

Poster

MAT-01 A Generalized Poisson Solver for First-Principles Device Simulations, Sascha Brück (ETH Zurich, Switzerland)

Co-Authors: Sascha Brück (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

We present a Poisson solver with main applications in ab-initio simulations of nanoelectronic devices. The solver employs a plane-wave (Fourier) based pseudospectral approach and is capable of solving the generalized Poisson equation with a position-dependent dielectric constant subject to periodic or homogeneous Neumann conditions on the boundaries of the simulation cell and Dirichlet type conditions imposed at arbitrary subdomains. Any sufficiently smooth function modelling the dielectric constant, including density dependent dielectric continuum models can be utilized. Furthermore, for all the boundary conditions, consistent derivatives are available allowing for energy conserving molecular dynamics simulations.
Brzobohatý Tomá Paper
Wednesday, June 8, 2016
Auditorium C, 16:30-17:00

Paper

Massively Parallel Hybrid Total FETI (HTFETI) Solver, Tomá Brzobohatý (IT4Innovations National Supercomputing Center, Ostrava, Czech Republic)

Co-Authors: Tomáš Brzobohatý (IT4Innovations National Supercomputing Center, Czech Republic); Alexandros Markopoulos (IT4Innovations National Supercomputing Center, Czech Republic); Ondřej Meca (IT4Innovations National Supercomputing Center, Czech Republic); Tomáš Kozubek (IT4Innovations National Supercomputing Center, Czech Republic)

This paper describes the Hybrid Total FETI (HTFETI) method and its parallel implementation in the ESPRESO library. HTFETI is a variant of the FETI type domain decomposition method in which a small number of neighboring subdomains is aggregated into clusters. This can be also viewed as a multilevel decomposition approach which results into a smaller coarse problem - the main scalability bottleneck of the FETI and FETI-DP methods.

The efficiency of our implementation which employs hybrid parallelization in the form of MPI and Cilk++ is evaluated using both weak and strong scalability tests. The weak scalability of the solver is shown on the 3 dimensional linear elasticity problem of a size up to 30 billion of Degrees Of Freedom (DOF) executed on 4096 compute nodes. The strong scalability is evaluated on the problem of size 2.6 billion DOF scaled from 1000 to 4913 compute nodes. The results show the super-linear scaling of the single iteration time and linear scalability of the solver runtime. The latter combines both numerical and parallel scalability and shows overall HTFETI solver performance. The large scale tests use our own parallel synthetics benchmark generator that is also described in the paper.

The last set of results shows that HTFETI is very efficient for problems of size up 1.7 billion DOF and provide better time to solution when compared to TFETI method.
Bunjaku Teutë Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:45-10:00

Contributed Talk

Diffusion Mechanisms in Li0.5CoO2: A Computational Study, Teutë Bunjaku (ETHZ, Switzerland)

Co-Authors:

The diffusion coefficient of Li-ions is one of the key parameters in the design of high performance Li-ion batteries (LIBs) and defines a strict upper bound for how fast Li-ions can be inserted or extracted from the electrodes. In this talk, through accurate atomistic simulations, we will study the order effects occurring in half filled layers of LiCoO2 and propose an explanation for the experimentally observed dip in the Li diffusivity at this concentration. Surprisingly, it is found that the lowest energy phase of Li0.5CoO2 is a zig-zag pattern rather than the currently assumed linear arrangement and the diffusion in this phase is highly anisotropic, thus explaining the observed dip in the diffusivity. The atomic interactions are modeled at the density-functional theory level of accuracy and energy barriers for Li-ion diffusion are determined from searches for first order saddle points on the resulting potential energy surface.
Buongiorno Nardelli Marco MS Presentation
Thursday, June 9, 2016
Garden 1BC, 14:30-15:00

MS Presentation

Novel Tools for Accelerated Materials Discovery: Breakthroughs and Challenges in the Mapping of the Materials Genome, Marco Buongiorno Nardelli (University of North Texas, United States of America)

Co-Authors:

The high-throughput computation of materials properties by ab initio methods has become the foundation of an effective approach to materials design, discovery and characterization. This approach to materials science currently presents the most promising path to the development of advanced technological materials that could solve or mitigate important social and economic challenges of the 21st century. Enhanced repositories such as AFLOWLIB open novel opportunities for structure discovery and optimisation, including uncovering of unsuspected compounds, metastable structures and correlations between various properties. However, the practical realization of these opportunities depends on the the design of efficient algorithms for electronic structure simulations of realistic material systems beyond the limitations of the current standard theories. In this talk, I will review recent progress in theoretical and computational tools, and in particular, discuss the development and validation of novel functionals within Density Functional Theory and of local basis representations for effective ab initio tight-binding schemes.
Burau Heiko MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, TU Dresden, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Bussmann Michael MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Byshkin Maksym Poster

Poster

EMD-03 Parallel MCMC for Estimating Exponential Random Graph Models, Maksym Byshkin (Università della Svizzera italiana, Switzerland)

Co-Authors: Alex Stivala (University of Melbourne, Australia); Antonietta Mira (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Garry Robins (University of Melbourne, Australia); Alessandro Lomi (Università della Svizzera italiana, Switzerland)

As information and communication technologies continue to expand, the need arises to develop analytical strategies capable of accommodating new and larger sets of social network data. Considerable attention has recently been dedicated to the possibility of scaling exponential random graph models (ERGMs) - a well-established family of statistical models - for analyzing large social networks. Efficient computational methods would be highly desirable in order to extend the empirical scope of ERGM for the analysis of large social networks. We report preliminary results of a research project on the development of new sampling methods for ERGMs. We propose a new MCMC sampler and use it with Metropolis coupled Markov chain Monte Carlo, a typical scheme for MCMC parallelization. We show that, using this method, the CPU time for parameter estimation may be considerably reduced. *Generous support from the Swiss National Platform of Advanced Scientific Computing (PASC) is gratefully acknowledged.
Bär Jeremia Poster

Poster

CSM-06 dCUDA: Hardware Supported Overlap of Computation and Communication, Jeremia Bär (ETH Zurich, Switzerland)

Co-Authors: Jeremia Bär (ETH Zurich, Switzerland); Torsten Hoefler (ETH Zurich, Switzerland)

In recent years, the CUDA programming model and underlying GPU hardware architecture have gained a lot of popularity in various application domains such as climate modelling, computational chemistry, and machine learning. Today, GPU cluster programming typically requires two different programming models that separately deal with on-node computation and inter-node communication. With dCUDA we present a unified GPU cluster programming model that implements device-side remote memory access operations with target notification. To hide instruction pipeline latencies, CUDA programs over-subscribe the hardware with many more threads than there are execution units. Whenever a thread stalls the hardware proceeds with another thread that is ready for execution. To make best use of the cluster interconnect, dCUDA applies the same latency hiding technique to automatically overlap on-node computation with inter-node communication. Our experiments demonstrate good and perfect overlap for compute-bound and memory-bound tasks respectively.

C

Calderara Mauro Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:15-09:30

Contributed Talk

Ab-Initio Quantum Transport Simulation of Nano-Devices, Mauro Calderara (ETH Zurich, Switzerland)

Co-Authors: Mauro Calderara (ETH Zurich, Switzerland); Mohammad Hossein Bani-Hashemian (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland)

To simulate advanced electronic devices such as nanoscale transistors or memory cells whose functionality may depend on the position of single atoms only, a quantum transport solver is needed, which should not only be capable of atomic scale resolution, but also to deal with systems consisting of thousands to a hundred thousands atoms. The device simulator OMEN and the electronic structure code CP2K have been united to perform ab initio quantum transport calculations on the level of density functional theory. To take full advantage of modern hybrid supercomputer architectures, new algorithms have been developed and implemented. They allow for the simultaneous computation of open boundary conditions in parallel on the available CPUs and the solution of the Schrödinger equation in a scalable way on the GPUs. The main concepts behind the algorithms will be presented and results for realistic nanostructures will be shown.
Calkins Michael A. Poster

Poster

EAR-04 Implicit Treatment of Inertial Waves in Dynamo Simulations, Michael A. Calkins (University of Colorado at Boulder, United States of America)

Co-Authors: Michael A. Calkins (University of Colorado at Boulder, United States of America); Keith Julien (University of Colorado at Boulder, United States of America)

The explicit treatment of inertial waves imposes a very small timestep in dynamo simulations at low Ekman number. We present a fully spectral Chebyshev tau method that allows us to treat the inertial waves implicitly. The large linear systems that need to be solved at each timestep remain affordable thanks to the sparsity of the formulation. The simulations are parallelised using a 2D data decomposition for the nonlinear calculations combined with a parallel linear solver for the timestepping. Despite the increased complexity, significant gains in wall-clock time are achieved thanks to larger timesteps.
Canann Taylor J. MS Presentation
Thursday, June 9, 2016
Garden 2A, 14:00-14:30

MS Presentation

Solving Large Systems of Polynomial Equations from Economics on Supercomputers, Taylor J. Canann (University of Minnesota, United States of America)

Co-Authors: Taylor J. Canann (University of Minnesota, United States of America)

Many problems in economics and game theory can be formulated as systems of polynomials. Methods from commutative algebra and numerical algebraic geometry can be used to find all solutions but the computational costs rise significantly as the size increases. Mathematicians have developed methods that exploit parallelism but have deployed them only on small systems. We demonstrate the effectiveness and scalability of these methods on supercomputers, with a focus on solving research problems in economics.
Cao Ruyin MS Presentation
Friday, June 10, 2016
Garden 3A, 10:00-10:15

MS Presentation

Ligand Binding to the Human Adenosine Receptor hA 2A R in Nearly Physiological Conditions, Ruyin Cao (Forschungszentrum Jülich, Germany)

Co-Authors: Ruyin Cao (Forschungszentrum Jülich, Germany); Andreas Bauer (Forschungszentrum Jülich, Germany); Paolo Carloni (Forschungszentrum Jülich, Germany)

Lipid composition may significantly affect membrane proteins function, yet its impact on the protein structural determinants is not well understood. Here we present a comparative molecular dynamics (MD) study of the human adenosine receptor type 2A (hA2AR) in complex with caffeine "a system of high neuro-pharmacological relevance" within different membrane types: POPC, mixed POPC/POPE and cholesterol-rich membranes. 0.8-μs MD simulations unambiguously show that the helical folding of the amphipathic helix 8 depends on membrane contents. Most importantly, the distinct cholesterol binding into the cleft between helix 1 and 2 stabilizes a specific caffeine-binding pose against others visited during the simulation. Hence, cholesterol presence (approximately 33%-50% in synaptic membrane in central nervous system), often neglected in X-ray determination of membrane proteins, affects the population of the ligand binding poses. We conclude that including a correct description of neuronal membranes may be very important for computer-aided design of ligands targeting hA2AR and possibly other GPCRs.
Capdeville Yann MS Presentation
Friday, June 10, 2016
Garden 1A, 10:15-10:30

MS Presentation

Non-Periodic Homogeneization for Seismic Forward and Inverse Problems, Yann Capdeville (CNRS, France)

Co-Authors:

The modelling of seismic elastic wave full waveform in a limited frequency band is now well established. The constant increase of computing power with time has now allowed the use of seismic elastic wave full waveforms in a limited frequency band to image the elastic properties of the earth. Nevertheless, inhomogeneities of scale much smaller than the minimum wavelength of the wavefield associated to the maximum frequency of the limited frequency band, are still a challenge for both forward and inverse problems. In this work, we tackle the problem of elastic properties varying much faster than the minimum wavelength, making it possible to link small and large scales for elastic or acoustic waves. Using a non periodic homogenization theory, we show how to compute effective elastic properties and local correctors. The implications of the homogenization theory on the inverse problem will be presented.
Caravati Sebastiano Poster

Poster

MAT-09 Sparse Matrix Multiplication Library for Linear Scaling DFT Calculations in Electronic Structure Codes, Sebastiano Caravati (University of Zurich, Department of Chemistry, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Andreas Glöss (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

The key operation for linear scaling DFT implemented in the CP2K quantum chemistry program is sparse matrix-matrix multiplication. For such a task, the sparse matrix library DBCSR (Distributed Block Compressed Sparse Row) has been developed. DBCSR takes full advantage of the block-structured sparse nature of the matrices for efficient computation and communication. It is MPI and OpenMP parallelized, and can exploit accelerators. We describe a strategy to improve DBCSR performance. DBCSR is available as a stand alone library at http://dbcsr.cp2k.org/ to be employed in electronic structure codes. To this end a streamlined API has been defined and a suite of tools has been developed to generate the full documentation of the library (API-DOC) by extracting the information provided directly in the source code. We give a flavour of the generated API-DOC by showing snapshots of selected HTML documentation pages and we sketch the design of such tools.
Carloni Paolo MS Presentation
Friday, June 10, 2016
Garden 3A, 10:00-10:15

MS Presentation

Ligand Binding to the Human Adenosine Receptor hA 2A R in Nearly Physiological Conditions, Paolo Carloni (Forschungszentrum Juelich, Germany)

Co-Authors: Ruyin Cao (Forschungszentrum Jülich, Germany); Andreas Bauer (Forschungszentrum Jülich, Germany); Paolo Carloni (Forschungszentrum Jülich, Germany)

Lipid composition may significantly affect membrane proteins function, yet its impact on the protein structural determinants is not well understood. Here we present a comparative molecular dynamics (MD) study of the human adenosine receptor type 2A (hA2AR) in complex with caffeine "a system of high neuro-pharmacological relevance" within different membrane types: POPC, mixed POPC/POPE and cholesterol-rich membranes. 0.8-μs MD simulations unambiguously show that the helical folding of the amphipathic helix 8 depends on membrane contents. Most importantly, the distinct cholesterol binding into the cleft between helix 1 and 2 stabilizes a specific caffeine-binding pose against others visited during the simulation. Hence, cholesterol presence (approximately 33%-50% in synaptic membrane in central nervous system), often neglected in X-ray determination of membrane proteins, affects the population of the ligand binding poses. We conclude that including a correct description of neuronal membranes may be very important for computer-aided design of ligands targeting hA2AR and possibly other GPCRs.
MS Summary

MS Summary

MS29 Molecular Neuromedicine: Recent Advances by Computer Simulation and Systems Biology, Paolo Carloni (Forschungszentrum Juelich, Germany)

Co-Authors: Giulia Rossetti (JSC and RWTH-UKA, Germany), Mercedes Alfonso-Prieto (University of Barcelona, Spain)

Innovative neuromedicine approaches require a detailed understanding of the molecular and systems-level causes of neurological diseases, their progression and the response to treatments. Ultimately, neuronal function and diseases are caused by exquisite molecular recognition processes during which specific biomolecules bind to each other allowing neuronal signaling, metabolism, synaptic transmission, etc. The detailed understanding of these processes, as well as the rational design of molecules for technology advances in neuropharmacology, require the characterization of neuronal biomolecules' structure, function, dynamics and energetics. The biomolecules and processes under study are inherently highly complex in terms of their size (typically on the order of 10^5-10^6 atoms) and time-scale (up to seconds), much longer than what can be simulated by standard molecular dynamics approaches (which, nowadays, can typically reach up to microseconds). This requires the development of methodologies in multiscale molecular simulation. Recent advancements include coarse-grained (CG) approaches that allow to study large systems on a long timescale, as well as very accurate hybrid methods combining QM modelling with molecular mechanics (MM) that provide descriptions of key neuronal photoreceptors, such as rhodopsin. In addition, Brownian dynamics are used to study biomolecular recognition and macromolecular assembly processes towards in vivo conditions. Such computational tools are invaluable for description, prediction and understanding of biological mechanisms in a quantitative and integrative way. This workshop will be an ideal forum to discuss both advancements and future directions in multiscale methodologies and applications on key signaling pathways in neurotransmission, such as those based on neuronal G-protein coupled receptors (GPCRs). These novel methodologies might prove to be instrumental to understand the underlying causes of brain diseases and to design new drugs aimed at their treatment.
Castelli Ivano E. MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:00-15:20

MS Presentation

High-Throughput Prediction of Novel Two-Dimensional Materials, Ivano E. Castelli (EPFL, Switzerland)

Co-Authors: Philippe Schwaller (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Ivano E. Castelli (EPFL, Switzerland); Marco Gibertini (EPFL, Switzerland); Giovanni Pizzi (EPFL, Switzerland); Nicola Marzari (EPFL, Switzerland)

As a crucial step towards the identification of novel and promising 2D materials, we provide here a large scale first-principles exploration and characterization of such compounds. From a combination of 480,000 structures harvested from the ICSD and COD databases, three-dimensional crystals are screened systematically by checking the absence of chemical bonds between adjacent layers, identifying more than 6,000 layered systems. Then DFT calculations of the van der Waals interlayer bonding are performed with automatic workflows, while systematically assessing the metallic, insulating or magnetic character of the materials obtained. Following full atomic and cell relaxations, phonon dispersions are computed as a first step towards the assessment of thermodynamic properties. Thanks to the AiiDA materials' informatics platform [1], and in particular its automatic workflow engine, database structure, sharing capabilities, and pipelines to/from crystallographic repositories, the systematic and reproducible calculation of these properties becomes straightforward, together with seamless accessibility and sharing. [1] http://aiida.net
Cavazzoni Carlo MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 16:00-16:30

MS Presentation

Quantum-ESPRESSO Open Source Community Code: The Challenge of Continuous Software Innovation for High-End High Performance Computing, Carlo Cavazzoni (CINECA, Italy)

Co-Authors:

Quantum-ESPRESSO builds upon electronic-structure codes that have been developed by some of the original authors of electronic-structure algorithms and is used by leading materials modelling groups worldwide. Innovation and efficiency are its main focus, with special attention paid to massively parallel architectures. Quantum-ESPRESSO has thousands of active users world-wide representing a large fraction of the workload of many HPC centers, it is widely used for benchmarking and co-design. The need to keep the pace with the evolution of the HPC architectures without disrupting the possibility to improve the algorithms and the physics implemented, together with the request of maintain a compatibility with low-end systems, regardless the hardware, the OS and the user interface, represent a real challenge, being a community effort of voluntary developers. In the talk the development strategy, the human network aspects and the technological challenges involved in this community effort, will be presented and discussed.
Chalk Aidan B. G. Paper
Wednesday, June 8, 2016
Auditorium C, 13:30-14:00

Paper

SWIFT: Using Task-Based Parallelism, Fully Asynchronous Communication, and Graph Partition-Based Domain Decomposition for Strong Scaling on more than 100 000 Cores, Aidan B. G. Chalk (Durham University, United Kingdom)

Co-Authors: Pedro Gonnet (Durham University, United Kingdom); Aidan B. G. Chalk (Durham University, United Kingdom); Peter Draper (Durham University, United Kingdom)

We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared / distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: (i) Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores; (ii) Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally distributed across all nodes; (iii) Fully dynamic and asynchronous communication, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferring on tasks that rely on data from other nodes until it arrives.

In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.
Charara Ali Paper
Thursday, June 9, 2016
Auditorium C, 10:30-11:00

Paper

Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs, Ali Charara (KAUST, Saudi Arabia)

Co-Authors: Hatem Ltaief (KAUST, Saudi Arabia), Damien Gratadour (L'Observatoire de Paris, France); Eric Gendron (L'Observatoire de Paris, France)

We present a high performance comprehensive implementation of a multi-object adaptive optics (MOAO) simulation on multicore architectures with hardware accelerators in the context of computational astronomy. This implementation will be used as an operational testbed for simulating the design of new instruments for the European Extremely Large Telescope project (E-ELT), the world's biggest eye and one of Europe's highest priorities in ground-based astronomy. The simulation corresponds to a multi-step multi-stage procedure, which is fed, near real-time, by system and turbulence data coming from the telescope environment. Based on the PLASMA library powered by the OmpSs dynamic runtime system, our implementation relies on a task-based programming model to permit an asynchronous out-of-order execution. Using modern multicore architectures associated with the enormous computing power of GPUs, the resulting data-driven compute-intensive simulation of the entire MOAO application, composed of the tomographic reconstructor and the observing sequence, is capable of coping with the aforementioned real-time challenge and stands as a reference implementation for the computational astronomy community.
Charpilloz Christophe MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:15-15:30

MS Presentation

Refactoring and Virtualizing a Mesoscale Model for GPUs, Christophe Charpilloz (Federal Office of Meteorology and Climatology MeteoSwiss, Zürich, Switzerland)

Co-Authors: Andrea Arteaga (MeteoSwiss, Switzerland); Christophe Charpilloz (MeteoSwiss, Switzerland); Salvatore Di Girolamo (ETH Zurich, Switzerland); Torsten Hoefler (ETH Zurich, Switzerland)

Our aim is to adopt the COSMO limited-area model to enable kilometer-scale resolution in climate simulation mode. As the resolution of climate simulations increases, storing the large amount of generated data becomes infeasible. To enable high-resolution models, we find a good compromise between the disk I/O costs and the need to access the output data for post-processing and analysis. We propose a data-virtualization layer that re-runs simulations on demand and transparently manages the data for the analytics applications. To achieve this goal, we developed a bit-reproducible version of the dynamical core of the COSMO model that runs on different architectures (e.g., CPUs and GPUs). An ongoing project is working on the reproducibility of the full COSMO code. We will discuss the strategies adopted to develop the data virtualization layer, the challenges associated with the reproducibility of simulations performed on different hardware architectures and the first promising results of our project.
Chen Yuxi MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:00-16:30

MS Presentation

Decoupling and Coupling in iPIC3D, a Particle-in-Cell Code for Exascale, Yuxi Chen (University of Michigan, United States of America)

Co-Authors: Stefano Markidis (Royal Institute of Technology, Sweden); Erwin Laure (Royal Institute of Technology, Sweden); Yuxi Chen (University of Michigan, United States of America); Gabor Toth (University of Michigan, United States of America); Tamas Gombosi (University of Michigan, United States of America)

iPIC3D is a massively parallel three-dimensional implicit particle-in-cell code used for the study of the interactions between the solar wind and Earth's magnetosphere. Because the scalability potential and scientific impact demonstrated by iPIC3D, it has been selected in many European HPC projects to prepare for the future exascale machines. In this talk, we present new algorithmic changes to iPIC3D in preparation for the coming exascale era. In particular, we will present decoupled computation, communication and I/O operations in iPIC3D to address the challenges of irregular operations on large number of processes. Our evaluation results show that the performance benefits from this model increases as the scale of simulation increases. We also present a two-way coupled kinetic-fluid model with multiple implicit PIC domains (by the iPIC3D code) embedded in MHD (by the BATS-R-US code) under the Space Weather Modeling Framework (SWMF).
Cho Jaehyun MS Presentation
Friday, June 10, 2016
Garden 2A, 10:20-10:40

MS Presentation

Multiscale Modeling of Frank-Read Source, Jaehyun Cho (EPFL, Switzerland)

Co-Authors: Guillaume Anciaux (EPFL, Switzerland)

The strength of materials is mainly controlled by dislocations. Their dynamics include nucleations and multiplications at grain boundaries. Nonlinear atomistic forces should be considered to quantify nucleations, while, to correctly model dislocation pile-up, forces in far-fields are required. Atomistic (MD) and discrete dislocation (DD) simulations have been performed to study these dynamics. In MD, nucleations were naturally considered, but limitations of domain sizes persisted. In DD, dislocation interactions including pile-up were well described. However, ad-hoc approaches are required for nucleations. These limitations lead to couple these two method: CADD3D. In this talk, we present a multiscale model of a Frank-Read source using CADD3D. Several dislocations will be nucleated from the Frank-Read source in the MD zone, and develop as complete closed loops. As they approach the DD domain, they will be transformed as DD dislocations. An observable consequence will be work-hardening effects due to the pile-up back stresses.
MS Presentation
Friday, June 10, 2016
Garden 2A, 09:30-09:55

MS Presentation

Concurrent Coupling of Particles with a Continuum for Dynamical Motion of Solids, Jaehyun Cho (EPFL, Switzerland)

Co-Authors: J. F. Molinari (EPFL, Switzerland); Till Junge (Karlsruhe Institute of Technology, Germany); Jaehyun Cho (EPFL, Switzerland)

There are many situations where the discrete nature of matter needs to be accounted by numerical models. For instance with crystalline materials, friction and ductile fracture modelling can benefit from Molecular Dynamics formalism. However, capturing these processes needs sizes involving large number of particles, often becoming out of reach of modern computers. Thus, concurrent multiscale approaches have emerged to reduce the computational cost by using a coarser continuum model. The difference between particles and continuum leads to several challenging problems. In this presentation, finite temperatures, numerical stability and dislocation passing will be addressed. Also the software framework LibMultiScale will be presented with its associated parallel computation design choices.
Chopard Bastien MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 15:30-16:00

MS Presentation

Heterogeneous Computations on HPC Infrastructures: Theoretical Framework, Bastien Chopard (Univeristy of Geneva, Switzerland)

Co-Authors: Jean-Luc Falcone (University of Geneva, Switzerland)

During last decades, computational scientists have experienced a sustained and continuous increase in models complexity. Current models are not only more detailed and accurate but also span across multiple scales and scientific domains. Instead of writing complicated and monolithic models ex novo, we have explored the coupling of existing single-scale and single-science applications in order to produce multi-scale and multi-science models. We have proposed a theoretical formalism which allows to describe how submodels are coupled (Multiscale Modeling Language - MML), as well as a coupling library (MUSCLE) which allows to build arbitrary workflow from the submodels. Currently, we are exploring the execution of such model across several computing resources, in order to increase available CPU power. In order to properly deploy an execution across several clusters we have developed a discrete event simulator able to predict the relevance of a given an allocation scheme.
MS Summary

MS Summary

MS27 CADMOS: HPC Simulations, Modeling and Large Data, Bastien Chopard (Univeristy of Geneva, Switzerland)

Co-Authors: Nicolas Salamin (University of Lausanne, Switzerland), Jan Hesthaven (EPFL, Switzerland)

CADMOS (Center for ADvance MOdelling Science) is a partnership between UNIGE, UNIL and EPFL whose goal is to promote HPC, modelling and simulation techniques, and data science for a broad range of relevant applications. New scientific results for well established HPC problems, or new methodological approaches to problems usually not solved by computer modelling or HPC resources are especially considered. In this minisymposium we will have presentations from each of the three partners, highlighting the above goals. We will also invite two external keynote speakers. Contributions reporting on the link between HPC and data science, or opening the door to new interdisciplinary applications within the scope of CADMOS are welcome.
Chrust Marcin MS Presentation
Wednesday, June 8, 2016
Garden 3B, 16:30-17:00

MS Presentation

Scalability and Performance of the NEMOVAR Variational Ocean Data Assimilation Software, Marcin Chrust (ECMWF, United Kingdom)

Co-Authors: Anthony Weaver (CERFACS, France); Magdalena Balmaseda (ECMWF, United Kingdom); Kristian Mogensen (ECMWF, United Kingdom)

Scalability and performance of the variational data assimilation software NEMOVAR for the NEMO ocean model is presented. NEMOVAR is a key component of the ECMWF operational Ocean analysis System 4 (Ocean S4) and future System 5 (Ocean S5). It is designed as a four dimensional variational assimilation (4D-Var) algorithm, which can also support three-dimensional (3D-Var) assimilation, using the First-Guess at Appropriate Time (FGAT) approach. Central to the code performance is the implementation of the correlation operator used for modelling of the background error covariance matrix. In NEMOVAR it is achieved using a diffusion operator. A new implicit formulation of the diffusion operator has been introduced recently which solves the underlying linear system using the Chebyshev iteration. The technique is more flexible and better suited for massively parallel machines than the method currently used operationally at ECMWF, but further improvements will be necessary for the future high-resolution applications.
Clément Valentin MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:45-15:00

MS Presentation

CLAW Code Manipulation for Performance Portability, Valentin Clément (C2SM, Switzerland)

Co-Authors:

As new hybrid HPC architectures are being more common, scientific applications are being adapted to take advantage of these new computing power. Programming standards like OpenACC have been successfully applied to allow parallelism in existing code to be offloaded efficiently on accelerators. Achieving optimal performance on various architectures with a single source code is not always possible. Restructuring the code and applying specific architecture optimisation is often needed to increase the performance. This situation has been observed while porting the COSMO weather model, as well as the CAM-SE model to an Hybrid CPU/GPU architecture. In order to help the code transformation and keep a single source code efficient on multiple architectures, we are developing a directive language as well as a tool named CLAW that allow the developer to specify and apply the necessary code transformations to generate both optimal GPU and CPU code from a single Fortran source code.
Poster

Poster

CLI-01 CLAW Provides Language Abstractions for Weather and Climate Models, Valentin Clément (C2SM, Switzerland)

Co-Authors: Valentin Clément (Center for Climate Systems Modeling (C2SM), ETH Zurich, Switzerland)

Achieving near optimal performance on different architectures (e.g., CPUs and GPUs) with a single source code is sometimes not possible and requires refactoring of the most performance critical code towards a desired architecture. This is the essence of 'performance portability' and this situation has been observed in several cases involving numerical weather and climate models. To help alleviate this situation, we are developing a tool named CLAW whose function is to allow developers to encode architecture-specific transformations into a single Fortran source. Our tool utilizes source-to-source compiler techniques to extend the Fortran grammar in a simple manner that allows the developer to activate the necessary code transformations automatically before compilation. To generalize the capabilities of the CLAW tool, we currently consider its use in the COSMO weather model and HAMMOZ and ICON climate models.
Colciago Claudia MS Presentation
Friday, June 10, 2016
Garden 3C, 10:00-10:30

MS Presentation

Fluid-Structure Interaction for Vascular Flows: From Supercomputers to Laptops, Claudia Colciago (EPFL, Switzerland)

Co-Authors: Claudia Colciago (EPFL, Switzerland); Davide Forti (EPFL, Switzerland)

Can we simulate haemodynamics in a vascular district in real time? One single heartbeat still takes several hours on an HPC platform, how can we reduce the computational complexity of 2-3 orders of magnitude? The key ingredients are model order reduction and numerical reduction combined with pre-processing on supercomputers. Blood flow in arteries needs to take into account the incompressibility of the fluid, the compliant vessel, and the patient specific data. After reducing the complexity of the model,i.e. by assuming a fixed fluid computational domain and a thin membrane structure, it is possible to use Proper Orthogonal Decomposition and Reduced Basis Method to split the computational effort into an offline and an online parts. The former runs on a HPC system in 5 hours on 1000 processors, while the latter runs in real time, i.e. 1 second of simulations in less than 1 second of CPU time, on a notebook.
Contarino Christian MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:00-13:30

MS Presentation

Junction-Generalized Riemann Problem for Stiff Hyperbolic Balance Laws in Networks of Blood Vessels, Christian Contarino (University of Trento, Italy)

Co-Authors: Eleuterio F. Toro (University of Trento, Italy); Gino I. Montecinos (Universidad de Chile, Chile); Raul Borsche (Technische Universität Kaiserslautern, Germany); Jochen Kall (Technische Universität Kaiserslautern, Germany)

We design a new implicit solver for the Junction-Generalized Riemann Problem (J-GRP), which is based on a recently proposed implicit method for solving the Generalized Riemann Problem (GRP) for systems of hyperbolic balance laws. We use the new J-GRP solver to construct an ADER scheme that is globally explicit, locally implicit and with no theoretical accuracy barrier, in both space and time. The resulting ADER scheme is able to deal with stiff source terms and can be applied to non-linear systems of hyperbolic balance laws in domains consisting on networks of one-dimensional sub-domains. Here we specifically apply the numerical techniques to networks of blood vessels. An application to a physical test problem consisting of a network of 37 compliant silicon tubes (arteries) and 21 junctions, reveals that it is imperative to use high-order methods at junctions, in order to preserve the desired high-order of accuracy in the full computational domain.
Cooper Wilfred Contributed Talk
Thursday, June 9, 2016
Garden 3A, 11:10-11:30

Contributed Talk

Self-Consistent Modelling of Plasma Heating and Fast Ion Generation Using Ion-Cyclotron Range of Frequency Waves in 2D and 3D Devices, Wilfred Cooper (EPFL, Switzerland)

Co-Authors: Wilfred Cooper (EPFL, Switzerland); Jonathan Graves (EPFL, Switzerland); David Pfefferlé (EPFL, Switzerland); Joachim Geiger (Max Planck Institute of Plasma Physics, Germany)

Ion-Cyclotron Range of Frequency (ICRF) waves is an efficient source of plasma heating in tokamaks and stellarators. In ICRF heated plasmas, the resonating particles phase-space distribution function displays significant distortion. A significant consequence is to modify noticeably the plasma properties which dictates the propagation of the ICRF wave. The self-consistent modelling tool SCENIC was built in order to solve this highly non-linear problem. It is one of the few ICRF modelling tools able to tackle both 2D and 3D plasma configurations. The computational resources, in particular the amount of shared memory required to resolve the plasma equilibrium and the wave propagation, significantly increases for simulations of strongly 3D equilibrium such as stellarators compared to 2D tokamaks calculation. We present some applications of SCENIC to tokamak and stellarator plasmas. Particular focus is given to simulations of the recenlty started Wendelstein7-X stellarator experiment which will use ICRF waves for fast particle generation.
Coulaud Olivier MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:00-15:20

MS Presentation

An Efficient Interpolation Based Fast Multipole Method for Dislocation Dynamics Simulations, Olivier Coulaud (INRIA, France)

Co-Authors: Arnaud Etcheverry (INRIA, France); Olivier Coulaud (INRIA, France); Laurent Dupuy (CEA, France); Eric Darve (Stanford University, United States of America)

Although the framework of Dislocation Dynamics (DD) provides powerful tools to model crystal plasticity, their efficient implementation is crucial in order to simulate very large ensembles of dislocations. Among all the steps involved in DD simulations, the computation of the internal elastic forces and energy are the most resource consuming. However, since they are long-ranged interactions, they can be efficiently evaluated using the Fast Multipole Method (FMM). We propose a new FMM formulation based on polynomial interpolations, that is optimised to reduce the memory footprint and the number of flops using fast Fourier transforms. These optimisations are necessary because of the tensorial nature of the kernel and the unusual memory requirements of this application. Regarding parallelism, our code benefits from a hybrid OpenMP/MPI paradigm and a cache-aware data structure. Numerical results will be presented to show the accuracy of this new approach and its parallel scalability.
Poster

Poster

CSM-08 Fast Randomized Algorithms for Covariance Matrix Computations, Olivier Coulaud (INRIA, France)

Co-Authors: Olivier Coulaud (INRIA, France); Eric Darve (Stanford University, United States of America); Alain Franc (INRIA, France)

Covariance matrices arise in many fields of modern scientific computation from geostatistics to data analysis, where they usually measure the correlation between grid points. Most algorithms involving such matrices have a superlinear cost in N, the size of the grid. We present an open-source library implementing efficient algorithms based on randomized low-rank approximations (LRA). The library can provide approximate decompositions of low-rank covariance matrices in O(N^2) operations instead of the usual O(N^3) cost of standard methods. In addition, low-rank covariance matrices given as kernels, e.g., Gaussian functions, and evaluated on 3D grids can be factorized in O(N) operations using randomized LRA and an FMM acceleration. The performance of the library is illustrated on two examples: the generation of Gaussian Random Fields on large O(10^6) points heterogeneous spatial grids, and the computation of reduced-order maps given distances between DNA sequences using Multi-Dimensional Scaling for the classification of species on 10^5 samples.
Crosetto Paolo MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:30-14:00

MS Presentation

Translating Python into GridTools: Prototyping PDE Solvers Using Stencils, Paolo Crosetto (ETH Zurich, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

The fast-paced environment of high-performance computing architectures has always been a challenge for complex codes. First, the effort to adapt the code to new processor architectures is significant compared to their typical release phase. Second, optimisations for one target often incur performance penalties on others. Third, such codes are generally developed by domain scientists, which typically lack the expertise about specific details of the target platform. Successful projects like STELLA have shown that a way out of this situation is to apply the concept of separation of concerns. GridTools is pushing this concept even further: The domain scientist's work is conducted within a prototyping environment using a domain-specific language (DSL), while the computer scientist profiles the automatically-generated code over diverse architectures, implemented by different hardware-specific backends. This talk will give an overview of the GridTools ecosystem, highlighting the use of the prototyping environment in combination with the automatic-code generation engine.
Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 12:10-12:30

Contributed Talk

The GridTools Libraries for the Solution of PDEs Using Stencils, Paolo Crosetto (ETH Zurich, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

Numerical weather prediction and climate models like COSMO and ICON solve explicitly a large set of PDEs. The STELLA library was successfully used to port the dynamical core of COSMO providing a performance portable code across multiple platforms. A significant performance speedup was obtained for NVIDIA GPUs as reported in doi>10.1145/2807591.2807676. However its applicability was restricted to only cartesian structured grids, finite difference methods, and is difficult to be used outside the COSMO model. The GridTools project emerges as an effort to provide an ecosystem for developing portable and efficient grid applications for the explicit solution of PDEs. GridTools generalizes STELLA to a wider class of weather and climate models on multiple grids: Cartesian and spherical, and offers facilities for performing communication and setting boundary conditions. Here we present the GridTools API and show performance on NVIDIA GPUs and x86 platforms.
Csanyi Gabor Poster

Poster

MAT-02 A Periodic Table of Molecules, Gabor Csanyi (University of Cambridge, United Kingdom)

Co-Authors: Albert P. Bartok (University of Cambridge, United Kingdom); Gabor Csanyi (University of Cambridge, United Kingdom); Michele Ceriotti (EPFL, Switzerland)

A graphical-representation of a database[1] of more than 7000 molecules containing non-homogeneous mix of C, N, O and S atoms has been generated using a non-linear dimensionality reduction algorithm (sketch-map[2]) so that molecules with similar composition and geometry are projected close to each other. The underlying metric is based on the REMatch-SOAP kernel[3], which is built upon a comparison of local environments[4], and treats the "alchemical" similarity between molecules, and the location of atoms in space, on the same footings. Much like in a periodic table, one can also recognize strong correlations between the positions of the molecules on the map and their physical-chemical properties, from formation energy to polarizability. References: [1] G. Montavon et al., NJP, 2013, 15, 095003; [2] M. Ceriotti, et al., JCTC, 2013, 9, 1521-1532; [3] S. De, et al., PCCP 2016, DOI: 10.1039/C6CP00415F; [4] A. P. Bartok et al., PRB, 2013, 87, 184115.
Curtin William MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:40-16:00

MS Presentation

Atomistic Modelings of Dislocation Cross-Slips in HCP Metals, William Curtin (EPFL, Switzerland)

Co-Authors: William Curtin (EPFL, Switzerland)

HCP metals such as Mg, Ti and Zr are a class of lightweight and/or highly durable metals with critical structural applications in the automotive, aerospace and nuclear industries. However, the fundamental mechanisms of deformation, strengthening and ductility in them are not well-understood, resulting in significant challenges in their plasticity models at all scales. We present the dislocation cross-slips in Mg using a DFT-validated interatomic potential and very large scale NEB calculations on HPC systems. We reveal a unique dislocation cross-slip mechanism and quantify the cross-slip energy barrier and its stress-dependence, which leads to tension-compression asymmetry and a switch in absolute stability of slip planes. All these are generic to HCP metals but very different from those well-established for cubic metals. Our results provide mechanistic insights into the cross-slip behaviour, rationalize the pyramidal I/II slip stability and enable the prediction of slip trends across the family of HCP metals.
MS Presentation
Friday, June 10, 2016
Garden 2A, 09:55-10:20

MS Presentation

A Parallel Algorithm for Multiscale Atomistic/Continuum Simulations, William Curtin (EPFL, Switzerland)

Co-Authors: W. Curtin (EPFL, Switzerland)

Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we discuss some of the available multiscale coupling algorithms and their most interesting features from an academic and a corporate research perspective. We then present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. The main purpose of this work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending the robust CADD-type displacement coupling to 3D. Our implementation allows us to reproduce results of extremely large atomistic simulations using fewer than 1,000,000 atoms, thus at a much lower computational cost.
MS Summary

MS Summary

MS17 Applications and Algorithms for HPC in Solid Mechanics I: Plasticity, William Curtin (EPFL, Switzerland)

Co-Authors: Guillaume Anciaux (EPFL, Switzerland)

Plastic deformation is made possible by the motion of interacting crystalline defects. In the study of the mechanisms controlling the macroscopic and effective plastic laws, it is of particular importance to understand the collective behaviour of dislocations. Available models generally represent dislocations with nodes connected with segments which can form a complex network structure. Within these forests, dislocations can nucleate, interact, join, annihilate. This forms an important challenge because of the many defaults present in the crystals such as impurities, grain boundaries and free surfaces. In order to capture the correct physics of the described processes, the employed (self-) interaction laws and the mobility laws have to be correctly acknowledged. Furthermore, the number of dislocation segments need to be important if one wants to achieve calculations comparable with experimental scales. In parallel, full atomistic simulations can provide insights into detailed mechanistic aspects of dislocation nucleation, transformation, and reactions that occur at the nanoscale, below the capabilities of mesoscale dislocation models. Consequently, the numerical calculations are challenging and call for HPC strategies. This minisymposium aims at fostering discussion on the newest advancements of the numerical models, accompanying algorithms, and applications. Also, we encourage researchers working in the field of dislocation plasticity to present analytic models and experimental results that complement studies performed with parallel algorithms.
MS Summary

MS Summary

MS25 Applications and Algorithms for HPC in Solid Mechanics II: Multiscale Modelling, William Curtin (EPFL, Switzerland)

Co-Authors: Guillaume Anciaux (EPFL, Switzerland), J. F. Molinari (EPFL, Switzerland)

In all observable phenomena, a full understanding of the operative physical mechanisms leads to extremely complicated models. One natural analysis path is to decompose the problem into simpler sub-problems yet transferring part of the complexity to the issue of coupling the sub-problems. In the study of materials and solids, considerable progress has been made in the numerical methods to couple scales. For instance, atomic, discrete-defect, meso-scale, and structural scales can now be coupled together under various assumptions. In this minisymposium, talks are solicited to present new work on coupling strategies, their mathematical description, and/or their implementation details for possible HPC machines. Such multiscale methods could deal with multi-grid, FE², concurrent methods, particle-continuum coupling, among others. Other multiscale themes are also welcome in this minisymposium.

D

Darve Eric MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:00-15:20

MS Presentation

An Efficient Interpolation Based Fast Multipole Method for Dislocation Dynamics Simulations, Eric Darve (Stanford University, United States of America)

Co-Authors: Arnaud Etcheverry (INRIA, France); Olivier Coulaud (INRIA, France); Laurent Dupuy (CEA, France); Eric Darve (Stanford University, United States of America)

Although the framework of Dislocation Dynamics (DD) provides powerful tools to model crystal plasticity, their efficient implementation is crucial in order to simulate very large ensembles of dislocations. Among all the steps involved in DD simulations, the computation of the internal elastic forces and energy are the most resource consuming. However, since they are long-ranged interactions, they can be efficiently evaluated using the Fast Multipole Method (FMM). We propose a new FMM formulation based on polynomial interpolations, that is optimised to reduce the memory footprint and the number of flops using fast Fourier transforms. These optimisations are necessary because of the tensorial nature of the kernel and the unusual memory requirements of this application. Regarding parallelism, our code benefits from a hybrid OpenMP/MPI paradigm and a cache-aware data structure. Numerical results will be presented to show the accuracy of this new approach and its parallel scalability.
Poster

Poster

CSM-08 Fast Randomized Algorithms for Covariance Matrix Computations, Eric Darve (Stanford University, United States of America)

Co-Authors: Olivier Coulaud (INRIA, France); Eric Darve (Stanford University, United States of America); Alain Franc (INRIA, France)

Covariance matrices arise in many fields of modern scientific computation from geostatistics to data analysis, where they usually measure the correlation between grid points. Most algorithms involving such matrices have a superlinear cost in N, the size of the grid. We present an open-source library implementing efficient algorithms based on randomized low-rank approximations (LRA). The library can provide approximate decompositions of low-rank covariance matrices in O(N^2) operations instead of the usual O(N^3) cost of standard methods. In addition, low-rank covariance matrices given as kernels, e.g., Gaussian functions, and evaluated on 3D grids can be factorized in O(N) operations using randomized LRA and an FMM acceleration. The performance of the library is illustrated on two examples: the generation of Gaussian Random Fields on large O(10^6) points heterogeneous spatial grids, and the computation of reduced-order maps given distances between DNA sequences using Multi-Dimensional Scaling for the classification of species on 10^5 samples.
Daverio David Contributed Talk
Thursday, June 9, 2016
Garden 3C, 15:25-15:45

Contributed Talk

Gevolution: A Cosmological N-Body Code Based on General Relativity, David Daverio (African Institute for Mathematical Sciences, South Africa)

Co-Authors: Martin Kunz (University of Geneva, Switzerland)

Cosmological structure formation is a highly non-linear process that can only be studied with the help of numerical simulations. This process is mainly governed by gravity, which is the dominant force on large scales. A century after the formulation of general relativity, numerical codes for structure formation still use Newton's law of gravitation. In my talk I will present results from the first simulations of cosmic structure formation using equations consistently derived from general relativity. Our particle-mesh N-body code gevolution computes all six degrees of freedom of the metric and consistently solves the geodesic equation for particles, taking into account the relativistic potentials and the frame-dragging force. Thanks to this, we were able to study in detail for a standard ΛCDM cosmology the small relativistic effects that cannot be obtained within a purely Newtonian framework.
Davydov Iakov MS Presentation
Thursday, June 9, 2016
Garden 3A, 14:30-15:00

MS Presentation

Large-Scale Analyses of Positive Selection Using Efficient Models of Codon Evolution, Iakov Davydov (University of Lausanne, Switzerland)

Co-Authors: Marc Robinson-Rechavi (University of Lausanne, Switzerland); Nicolas Salamin (Swiss Institute of Bioinformatics, Switzerland)

Models of codon evolution are widely used to identify signatures of positive selection in protein coding genes. While the analysis of a single gene family usually takes less than an hour on an average computer, the detection of positive selection on genomic data becomes a computationally intensive problem. In order to support our full genome database of positive selection 'Selectome' (http://selectome.unil.ch/) we develop a series of high-performance computing tools to analyse positive selection. These improvements allow us to develop new and more realistic, but computationally tractable, models of codon evolution.
De Sandip Poster

Poster

MAT-02 A Periodic Table of Molecules, Sandip De (EPFL, Switzerland)

Co-Authors: Albert P. Bartok (University of Cambridge, United Kingdom); Gabor Csanyi (University of Cambridge, United Kingdom); Michele Ceriotti (EPFL, Switzerland)

A graphical-representation of a database[1] of more than 7000 molecules containing non-homogeneous mix of C, N, O and S atoms has been generated using a non-linear dimensionality reduction algorithm (sketch-map[2]) so that molecules with similar composition and geometry are projected close to each other. The underlying metric is based on the REMatch-SOAP kernel[3], which is built upon a comparison of local environments[4], and treats the "alchemical" similarity between molecules, and the location of atoms in space, on the same footings. Much like in a periodic table, one can also recognize strong correlations between the positions of the molecules on the map and their physical-chemical properties, from formation energy to polarizability. References: [1] G. Montavon et al., NJP, 2013, 15, 095003; [2] M. Ceriotti, et al., JCTC, 2013, 9, 1521-1532; [3] S. De, et al., PCCP 2016, DOI: 10.1039/C6CP00415F; [4] A. P. Bartok et al., PRB, 2013, 87, 184115.
Debus Alexander MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Decaix Jean MS Presentation
Thursday, June 9, 2016
Garden 3A, 11:50-12:10

MS Presentation

URANS Computations of an Unstable Cavitating Vortex Rope, Jean Decaix (HES-SO Valais//Wallis, Switzerland)

Co-Authors: Andres Müller (EPFL, Switzerland); François Avellan (EPFL / LMH, Switzerland); Cécile Münch (HES-SO Valais-Wallis, Switzerland)

Due to the massive penetration of alternative renewable energies, hydraulic power plants are key energy conversion technologies to stabilize the electrical power network using hydraulic machines at off design operating conditions. For a flow rate larger than the one at the best efficient point a cavitating vortex rope occurs, leading to strong pressure surges in the entire hydraulic system. To better understand the mechanisms responsible for the pressure surges, URANS simulations of a reduced scale Francis turbine are performed. Several sigma values are investigated corresponding to stable and unstable cavitating vortex ropes. The results are compared with the experimental measurements. The main challenge of the computations is the long physical time, compared to the time step, required to capture the beginning of the instability.
Dehnen Walter MS Presentation
Thursday, June 9, 2016
Garden 3C, 14:00-14:20

MS Presentation

N-Body Time Integration: Towards Time Reversability, Walter Dehnen (Leicester University, United Kingdom)

Co-Authors:

Astrophysical N-body problems range from planetary system to galaxies and the whole universe. The dynamical times of individual particles in such systems vary by many orders of magnitude. Therefore, N-body simulations employ individual adaptive time steps, which for reasons of efficiency are discretised and hierarchically organised (block-step). The N-body problem itself satisfies time-reversibility and it is therefore desirable that the simulation codes do too to avoid artificial dissipation. However, all current methods to adapt individual time steps with the block-step fail to preserve time reversibility. I investigate the seriousness of this violation and discuss possible ways to alleviate the situation. It appears that it is impossible to adapt the time steps reversibly with block-step in an explicit and efficient way, but nearly time reversible methods, better than current practice, are possible.
MS Presentation
Thursday, June 9, 2016
Garden 3C, 14:50-15:05

MS Presentation

New Time Step Criterion in Gravitational N-Body Simulations, Walter Dehnen (Leicester University, United Kingdom)

Co-Authors: Walter Dehnen (University of Leicester, United Kingdom)

We present a new time step criterion for gravitational N-body simulations that is based on the norm of gradient tensor of the acceleration that is not gauge invariance in all circumstance and based on the orbital time of particle. We have tested this time step criterion in the simulation of single orbit with high eccentricity as well as N-Body problem using direct summation force calculation and a Plummer model as the initial condition. This time step criterion requires fewer force evaluation than other time step criteria.
MS Summary

MS Summary

MS22 N-Body Simulations Techniques, Walter Dehnen (Leicester University, United Kingdom)

Co-Authors: Joachim Stadel (University of Zurich, Switzerland)

Many astrophysical systems are modelled by N-body simulations, where the simulated particles either correspond directly to physical bodies (such as in studies of the Solar system and star cluster) or are representative of a much larger number of collisionlessly moving physical objects (such as in studies of galaxies and large-scale structure). N-body systems are Hamiltonian systems and simulating their long-term evolution is a difficult numerical challenge because of shot noise, errors of the approximated forces, and strong inhomogeneities of the systems in both space and time scales. This latter point is a major difficulty for the development of efficient and scalable algorithms for HPC architectures.
Demidov Denis MS Presentation
Thursday, June 9, 2016
Auditorium C, 14:30-15:00

MS Presentation

VexCL: Experiences in Developing a C++ Wrapper Library for OpenCL, Denis Demidov (Kazan Federal University, Russia)

Co-Authors:

VexCL is a C++ vector expression template library for fast GPGPU prototyping and development. It provides convenient notation for various vector operations, and generates OpenCL/CUDA kernels on the fly from C++ expressions. Among the challenges met during the library development were the need to support multiple backends (OpenCL/CUDA), the fact that the OpenCL compute kernel language is C99 and thus lacks some conveniences of C++, etc. The talk provides brief overview of the library implementation and the C++ techniques used to overcome the difficulties.
Deparis Simone MS Presentation
Friday, June 10, 2016
Garden 3C, 10:00-10:30

MS Presentation

Fluid-Structure Interaction for Vascular Flows: From Supercomputers to Laptops, Simone Deparis (EPFL, Switzerland)

Co-Authors: Claudia Colciago (EPFL, Switzerland); Davide Forti (EPFL, Switzerland)

Can we simulate haemodynamics in a vascular district in real time? One single heartbeat still takes several hours on an HPC platform, how can we reduce the computational complexity of 2-3 orders of magnitude? The key ingredients are model order reduction and numerical reduction combined with pre-processing on supercomputers. Blood flow in arteries needs to take into account the incompressibility of the fluid, the compliant vessel, and the patient specific data. After reducing the complexity of the model,i.e. by assuming a fixed fluid computational domain and a thin membrane structure, it is possible to use Proper Orthogonal Decomposition and Reduced Basis Method to split the computational effort into an offline and an online parts. The former runs on a HPC system in 5 hours on 1000 processors, while the latter runs in real time, i.e. 1 second of simulations in less than 1 second of CPU time, on a notebook.
MS Summary

MS Summary

MS01 Advanced Computational Methods for Applications to the Cardiovascular System I, Simone Deparis (EPFL, Switzerland)

Co-Authors: Dominik Obrist (University of Bern, Switzerland), Christian Vergara (Politecnico di Milano, Italy)

Cardiac and Cardiovascular Mathematics represents nowadays a challenging topic in view of the emerging and growing collaborations between clinicians and mathematicians. In this respect, the numerical solution of problems arising in modelling cardiac and systemic phenomena opens new and interesting perspectives which need to be properly addressed. From the cardiac side, a fully integrated heart model represents a complex multiphysics problem, which is in turn composed of several submodels describing cardiac electrophysiology, mechanics, and fluid dynamics. On the system circulation side, although it has been studied for a longer time, several mathematical and numerical aspects still need to be addressed (e.g., tissue remodelling, atherosclerotic plaque formation, aneurysms development, transitional and turbulence phenomena in blood flows). This minisymposium aims at gathering researchers and experts in computational and numerical modelling of the heart and the systemic circulation.
MS Summary

MS Summary

MS07 Advanced Computational Methods for Applications to the Cardiovascular System II, Simone Deparis (EPFL, Switzerland)

Co-Authors: Dominik Obrist (University of Bern, Switzerland), Christian Vergara (Politecnico di Milano, Italy)

Cardiac and Cardiovascular Mathematics represents nowadays a challenging topic in view of the emerging and growing collaborations between clinicians and mathematicians. In this respect, the numerical solution of problems arising in modelling cardiac and systemic phenomena opens new and interesting perspectives which need to be properly addressed. From the cardiac side, a fully integrated heart model represents a complex multiphysics problem, which is in turn composed of several submodels describing cardiac electrophysiology, mechanics, and fluid dynamics. On the system circulation side, although it has been studied for a longer time, several mathematical and numerical aspects still need to be addressed, as e.g. tissue remodelling, atherosclerotic plaque formation, aneurysms development, transitional and turbulence phenomena in blood flows. This minisymposium aims at gathering researchers and experts in computational and numerical modelling of the heart and the systemic circulation.
Deriaz Erwan MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:45-17:00

MS Presentation

Adaptive Mesh Refinement and Other Adaptive Strategies for Vlasov Simulation, Erwan Deriaz (CNRS, France)

Co-Authors:

Simulating the full six-dimensional phase-space Vlasov equations represents a challenge due to the Curse of Dimensionality. In the struggle to reduce the number of degrees of freedom needed in the Eulerian simulations, several adaptive methods were developed. Adaptive Mesh Refinement technics allow to dynamically adapt the grid in an isotropic way. Both Block-Structured [Hittinger, Collela] and Fully-Threaded Tree [Khoklov, Zanoti, Dumbser] implementations succeed in lower-than-three-dimensional kinetic simulations. For higher dimensionality, the authors of adaptive methods tend to favour tensor product structures such as the Tree-of-Tree method [Kolobov], the Sparse Grids [Griebel, Bungartz] or the Tensor-Train method [Oseledets, Kormann]. We propose to discuss and compare their respective advantages and drawbacks.
Dessimoz Christophe MS Presentation
Thursday, June 9, 2016
Garden 3A, 15:45-16:00

MS Presentation

Speeding Up All-Against-All Sequence Alignment Among Thousands of Genomes, Christophe Dessimoz (University of Lausanne, Switzerland)

Co-Authors:

Orthology inference and other sequence analyses across multiple genomes typically start by performing exhaustive pairwise sequence comparisons, a process referred to as "all-against-all". As this process scales quadratically in terms of the number of sequences analysed, this step can become a bottleneck, thus limiting the number of genomes that can be simultaneously analysed. For the OMA database, developed in our laboratory, we have performed "all-against-all", between 2000 complete genomes, at a computational cost of several million CPU hours. In this talk, I will report on some of the strategies we have explored to speed-up the "all-against-all" step while maintaining its sensitivity.
MS Summary

MS Summary

MS19 Harnessing Big Data for Biological Discovery: Scientific Computing at the SIB Swiss Institute of Bioinformatics, Christophe Dessimoz (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Christophe Dessimoz (University of Lausanne, Switzerland)

Technological advances in sequencing, mass spectrometry, and other high-throughput technologies are transforming biological sciences into a quantitative discipline. The unabated, exponential growth of data poses acute modelling and computational challenges, but also unprecedented opportunities to comprehensively and accurately interrelate biological entities and elucidate their inner workings. This minisymposium will showcase ongoing scientific computing projects and activities at the SIB Swiss Institute of Bioinformatics. The talks will address problems pertaining to a wide variety of biological data and models, including genes, genomes, proteomes, phylogenies, and networks. SIB is an independent, not-for-profit foundation recognized of public utility. SIB includes some 60 research and service groups, which bring together some 750 scientists in the fields of genomics, transcriptomics, proteomics, evolution, population genetics, systems biology, structural biology, biophysics and clinical bioinformatics, located in the Swiss cantons of Basel, Bern, Fribourg, Geneva, Ticino, Vaud and Zurich. SIB provides life scientists and clinicians in academia and industry with a state-of-the-art bioinformatics infrastructure, including resources, expertise and services, federates world-class researchers, and delivers training in bioinformatics. It has a longstanding tradition of producing state-of-the-art software and carefully annotated databases. SIB also provides leading educational services, data analysis support and bioinformatics research.
Dettmers Tim Poster

Poster

EMD-02 Large Scale Xeon Phi Parallelization of a Deep Learning Language Model, Tim Dettmers (Università della Svizzera italiana, Switzerland)

Co-Authors: Tim Dettmers (Università della Svizzera italiana, Switzerland); Olaf Schenk (Università della Svizzera italiana, Switzerland)

Deep learning is a recent predictive modelling approach which yields near-human performance on a range of tasks. Deep learning language models have gained popularity as they achieved state-of-the-art results in many language tasks, such as language translation, but are computationally intensive thus requiring computers with accelerators and weeks of computation time. Here we propose a parallel algorithm for running deep learning language models on hundreds of nodes equipped with Xeon Phis to reduce the computation time to mere hours. We use MPI for the parallelization among nodes and use Xeon Phis to accelerate the matrix multiplications which make up more than 75% of the total computation. With our algorithm experimentation can be done much faster thus enabling rapid progress in the sparsely explored domain of natural language understanding.
Deutsch Thierry MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 14:20-14:40

MS Presentation

BigDFT: Flexible DFT Approach to Large Systems Using Adaptive and Localized Basis Functions, Thierry Deutsch (CEA, France)

Co-Authors: Luigi Genovese (CEA/INAC, France); Stefan Mohr (BSC, Spain); Laura Ratcliff (Argonne National Laboratory, United States of America); Stefan Goedecker (University of Basel, Switzerland)

Since 2008, the BigDFT project consortium has developed an ab initio DFT code based on Daubechies wavelets. In recent articles, we presented the linear scaling version of BigDFT code[1], where a minimal set of localized support functions is optimised in situ for systems in various boundary conditions. We will present how the flexibility of this approach is helpful in providing a basis set that is optimally tuned to the chemical environment surrounding each atom. In addition than providing a basis useful to project Kohn-Sham orbitals informations like atomic charges and partial density of states, it can also be reused as-is, without re-optimisation, for charge-constrained DFT calculations within a fragment approach[2]. We will demonstrate the interest of this approach to express highly precise and efficient calculations of systems in complex environments[3]. [1] JCP 140, 204110 (2014), PCCP 17, 31360 (2015) [2] JCP 142, 23, 234105 (2015) [3] JCTC 11, 2077 (2015)
Di Marino Daniele Contributed Talk
Thursday, June 9, 2016
Garden 3A, 10:30-10:50

Contributed Talk

A Comprehensive Description of the Homo and Heterodimerization Mechanism of the Chemokine Receptors CCR5 and CXCR4, Daniele Di Marino (Department of Informatics, Institute of Computational Science Università della S, Switzerland)

Co-Authors: Vittorio Limongelli (Università della Svizzera italiana, Switzerland)

Signal transduction across cellular membranes is controlled by G protein coupled receptors (GPCRs). It is widely accepted that members of the GPCR family self-assemble as dimers or higher-order structures being functional units in the plasma membrane. The chemokines receptors are GPCRs implicated in a wide range of physiological and non-physiological cell processes. These receptors represent prime targets for therapeutic intervention in a wide spectrum of inflammatory and autoimmune diseases, heart diseases, cancer and HIV. The CXCR4 and CCR5 receptors are two of the manly studied playing crucial roles in different pathologies. In this scenario the use of computational techniques able to describe complex biological processes such as protein dimerization acquires a great importance. Combining coarse-grained (CG) molecular dynamics and well-tempered metadynamics (MetaD) we are able to describe the mechanism of dimer formation, capturing multiple association and dissociation events allowing to compute a detailed free energy landscape of the process.
Poster

Poster

LS-01 A Comprehensive Description of the Homo and Heterodimerization Mechanism of the Chemokine Receptors CCR5 and CXCR4, Daniele Di Marino (Department of Informatics, Institute of Computational Science Università della S, Switzerland)

Co-Authors: Vittorio Limongelli (Università della Svizzera italiana, Switzerland)

Signal transduction across cellular membranes is controlled by G protein coupled receptors (GPCRs). It is widely accepted that members of the GPCR family self-assemble as dimers or higher-order structures being functional units in the plasma membrane. The chemokines receptors are GPCRs implicated in a wide range of physiological and non-physiological cell processes. These receptors represent prime targets for therapeutic intervention in a wide spectrum of inflammatory and autoimmune diseases, heart diseases, cancer and HIV. The CXCR4 and CCR5 receptors are two of the manly studied playing crucial roles in different pathologies. In this scenario the use of computational techniques able to describe complex biological processes such as protein dimerization acquires a great importance. Combining coarse-grained (CG) molecular dynamics and well-tempered metadynamics (MetaD) we are able to describe the mechanism of dimer formation, capturing multiple association and dissociation events allowing to compute a detailed free energy landscape of the process.
Diesmann Markus MS Presentation
Friday, June 10, 2016
Garden 2BC, 09:00-09:20

MS Presentation

Technology for Brain-Scale Simulation at Cellular Resolution, Markus Diesmann (Forschungszentrum Juelich, Germany)

Co-Authors:

At cellular resolution, the entities of neural networks are neurons and their connections, the synapses. Already early parallel simulation codes stored synapses in a distributed fashion such that a synapse solely consumes memory on the compute node harboring the target neuron. As petascale computers with some 100,000 nodes become increasingly available for neuroscience, new challenges arise: each neuron contacts on the order of 10,000 other neurons but for any given source neuron, at most a single synapse is typically created on a compute node. Here we present data structures taking advantage of this specific sparseness. After introducing the relevant scaling scenario for brain-scale simulations, we quantitatively discuss the performance on two supercomputers: JUQUEEN and the K computer. The contribution discusses the largest general neuronal network simulation carried out to date, comprising more than a billion neurons, and provides the evidence that neuroscience can exploit petascale machines.
MS Summary

MS Summary

MS28 Level of Detail in Brain Modeling: Common Abstractions and their Scientific Use, Markus Diesmann (Forschungszentrum Juelich, Germany)

Co-Authors: Felix Schuermann (EPFL, Switzerland)

Brain simulation has been a modelling challenge to the same degree as is has been a simulation challenge in the sense that depending on the scope of the question the actual mathematical formalisms vary profoundly. At the same time, the decision for a certain scope and formalism is taken in light of the available data and computational tractability. Traditionally, researchers wanting to understand the function of brains accordingly have chosen a more "top-down" approach, trying to keep complexity to the minimum. Researchers interested in understanding the brain as a system (e.g., needed for diseases) have little choice other than to embrace more "bottom-up" approaches that incorporate the biophysical and even the biochemical diversity found in brain tissue. More recently, the steady increase of computational capabilities as described in Moore's law has reached levels that large scale and fine detail are achievable at the same time. Modern informatics workflows and technologies help us to make complex scientific team efforts more tractable and reproducible. Together with high-quality, brain-wide data sets at increasing resolution and specificity, brain simulation finally is on a journey that should make it possible to overcome the divide. The minisymposium highlights this exciting convergence and the two prominent abstractions to brain modelling and simulation. The first one stops at the resolution of individual nerve cells (point neuron modelling) whereas the second takes the detailed morphologically of neurons and their circuitry into account. We present two major simulation tools, NEST and NEURON, which are open source and community standards for their respective abstraction since many years. Since about a decade these tools are capable of running on massively parallel computers. Recently, they have been shown to be ready to exploit the class of petascale HPC machines. The minisymposium presents the computational characteristics and requirements of the codes. For both abstractions, we showcase specific neuroscience applications using the respective tools and representing cutting edge in silico neuroscientific research. The presenting researchers are members of the European Human Brain Project. The presented simulators are partially supported by that effort in order to integrate them into a novel research infrastructure for brain research. One contributed talk on point neuron modelling and simulation and another on detailed neuron modelling and simulation are presented.
Donners John MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, John Donners (surfSARA, Netherlands)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Donzis Diego MS Presentation
Thursday, June 9, 2016
Garden 2A, 10:30-11:00

MS Presentation

Turbulence Simulations at Extreme Scales: A Path Towards Exascale, Diego Donzis (Texas A&M University, United States of America)

Co-Authors:

Turbulence is the most common state of fluid motion in nature and engineering and is critical in environmental, astrophysical and engineering flows. However, the complexity of the governing equations leads to wide ranges of temporal and spatial scales and render the problem almost intractable analytically. Thus, simulations, in particular direct numerical simulations (DNS) which resolve the entire range of scales from the exact governing equations, have become an indispensable tool to advance the field. While very accurate spectral methods have been used extensively up to petascale levels, they typically require collective communications and synchronizations, two well-known potential bottlenecks at exascale. We present our recent work on novel asynchronous numerical schemes that virtually remove computational obstacles at a mathematical level and present a path towards exascale DNS of turbulent flows. We will highlight implications, challenges and opportunities in terms of numerical issues, parallel performance, and implementation issues on future exascale systems.
Draper Peter Paper
Wednesday, June 8, 2016
Auditorium C, 13:30-14:00

Paper

SWIFT: Using Task-Based Parallelism, Fully Asynchronous Communication, and Graph Partition-Based Domain Decomposition for Strong Scaling on more than 100 000 Cores, Peter Draper (Durham University, United Kingdom)

Co-Authors: Pedro Gonnet (Durham University, United Kingdom); Aidan B. G. Chalk (Durham University, United Kingdom); Peter Draper (Durham University, United Kingdom)

We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared / distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: (i) Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores; (ii) Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally distributed across all nodes; (iii) Fully dynamic and asynchronous communication, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferring on tasks that rely on data from other nodes until it arrives.

In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.
Dueben Peter D. MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:45-16:00

MS Presentation

The Use of Inexact Hardware to Improve Weather and Climate Predictions, Peter D. Dueben (University of Oxford, United Kingdom)

Co-Authors: Tim N. Palmer (Oxford University, United Kingdom)

In weather and climate models values of relevant physical parameters are often uncertain by more than 100%. Still, numerical operations are typically calculated in double precision with 15 significant decimal digits. If we reduce numerical precision, we can reduce power consumption and increase computational performance significantly. If savings in computing power are reinvested, this will allow an increase in resolution in weather and climate models and an improvement of weather and climate predictions. I will discuss approaches to reduce numerical precision beyond single precision in HPC and in weather and climate modelling. I will present results that show that precision can be reduced significantly in atmosphere models and that potential savings are huge. Finally, I will discuss how to reduce precision in weather and climate models most efficiently and how rounding errors will impact on model dynamics and predictability. I will also outline implications for data assimilation and data storage.
Dunne Fionn MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:20-15:40

MS Presentation

Multiscale Modelling of Dwell Fatigue in Polycrystalline Titanium Alloys, Fionn Dunne (Imperial College London, United Kingdom)

Co-Authors: Daniel Balint (Imperial College London, United Kingdom); Fionn Dunne (Imperial College London, United Kingdom)

Titanium alloys are used for manufacturing highly stressed components of gas turbine engine such as discs and blades due to their low density, excellent corrosion resistance and high fatigue strength. However, it has been reported that these alloys exhibit a significant fatigue life reduction, called dwell debit, under cyclic loading that includes a hold at the peak stress. In this study, a rate-dependent crystal plasticity framework was first used to reproduce the experimentally observed macroscopic response of Ti624x (x = 2 and 6) alloys under low-cycle fatigue and low-cycle dwell fatigue loading, which enabled relevant material constants for the two alloys to be determined. These were then utilized in a discrete dislocation plasticity model using the same thermally activated rate controlling mechanism to examine the dwell behaviour of the alloys.
Dupuy Laurent MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:00-15:20

MS Presentation

An Efficient Interpolation Based Fast Multipole Method for Dislocation Dynamics Simulations, Laurent Dupuy (CEA, France)

Co-Authors: Arnaud Etcheverry (INRIA, France); Olivier Coulaud (INRIA, France); Laurent Dupuy (CEA, France); Eric Darve (Stanford University, United States of America)

Although the framework of Dislocation Dynamics (DD) provides powerful tools to model crystal plasticity, their efficient implementation is crucial in order to simulate very large ensembles of dislocations. Among all the steps involved in DD simulations, the computation of the internal elastic forces and energy are the most resource consuming. However, since they are long-ranged interactions, they can be efficiently evaluated using the Fast Multipole Method (FMM). We propose a new FMM formulation based on polynomial interpolations, that is optimised to reduce the memory footprint and the number of flops using fast Fourier transforms. These optimisations are necessary because of the tensorial nature of the kernel and the unusual memory requirements of this application. Regarding parallelism, our code benefits from a hybrid OpenMP/MPI paradigm and a cache-aware data structure. Numerical results will be presented to show the accuracy of this new approach and its parallel scalability.
Durinx Christine MS Summary

MS Summary

MS19 Harnessing Big Data for Biological Discovery: Scientific Computing at the SIB Swiss Institute of Bioinformatics, Christine Durinx (SIB Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Christophe Dessimoz (University of Lausanne, Switzerland)

Technological advances in sequencing, mass spectrometry, and other high-throughput technologies are transforming biological sciences into a quantitative discipline. The unabated, exponential growth of data poses acute modelling and computational challenges, but also unprecedented opportunities to comprehensively and accurately interrelate biological entities and elucidate their inner workings. This minisymposium will showcase ongoing scientific computing projects and activities at the SIB Swiss Institute of Bioinformatics. The talks will address problems pertaining to a wide variety of biological data and models, including genes, genomes, proteomes, phylogenies, and networks. SIB is an independent, not-for-profit foundation recognized of public utility. SIB includes some 60 research and service groups, which bring together some 750 scientists in the fields of genomics, transcriptomics, proteomics, evolution, population genetics, systems biology, structural biology, biophysics and clinical bioinformatics, located in the Swiss cantons of Basel, Bern, Fribourg, Geneva, Ticino, Vaud and Zurich. SIB provides life scientists and clinicians in academia and industry with a state-of-the-art bioinformatics infrastructure, including resources, expertise and services, federates world-class researchers, and delivers training in bioinformatics. It has a longstanding tradition of producing state-of-the-art software and carefully annotated databases. SIB also provides leading educational services, data analysis support and bioinformatics research.

E

Eckert Carlchristian MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, TU Dresden, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Edwards Carter MS Presentation
Thursday, June 9, 2016
Auditorium C, 15:30-16:00

MS Presentation

Portability of Performance: The Cases of Kokkos and GridTools, Carter Edwards (Sandia National Laboratories, United States of America)

Co-Authors: Carter Edwards (Sandia National Laboratories, United States of America)

HPC libraries have provided abstractions for common and performance critical operations for decades. When uniform memory architectures were predominant, the main focus of library implementers was algorithm implementation, while data structures and layout had a secondary role. As memory architectures become more diverse, it became necessary to adjust simultaneously algorithmic needs and memory characteristics. Several recent library approaches tackle this problem, especially as performance portability is now essential. In this talk we will describe two libraries that address these issues: Kokkos and GridTools. Kokkos provides two fundamental abstractions: one for dispatching work for parallel execution and one for managing multidimensional arrays with polymorphic layouts. GridTools' main abstraction allows a programmer to describe complex stencil applications in an architecture agnostic way. Both libraries use the template mechanisms in C++ for high flexibility, thus avoiding source-to-source translators and proprietary annotations.
Eftekhari Aryan Poster

Poster

EMD-01 Parallelized Dimensional Decomposition for Dynamic Stochastic Economic Models, Aryan Eftekhari (USI, Switzerland)

Co-Authors: Olaf Schenk (Università della Svizzera italiana, Switzerland); Simon Scheidegger (University of Zurich & Stanford University, Switzerland)

This project explores a technique called High-Dimensional Model Representation (HDMR), which allows for the decomposition of a function into a finite number of lower-dimensional component functions. HDMR leverages the lack of input correlation to effectively reduce dimensionality of the problem in exchange for accuracy. Due to the intrinsic separability and hierarchical construction, HDMR provides the opportunity for both embarrassingly parallel model estimation and adaptive selection of active or significant inputs. An application of HDMR in conjunction with Adaptive Sparse Grids is shown in the context of computational economics, in which we provide an efficient solution method for high-dimensional dynamic stochastic models. Our results show that HDMR can effectively capture model dynamics with relatively low-dimensional component functions, thus mitigating the so-called "curse of dimensionality" and allowing for computability of larger systems.
Emonts Patrick Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Patrick Emonts (Ecole Polytechnique Fédérale de Lausanne (EPFL), Swiss Plasma Center (SPC), CH-1, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Etcheverry Arnaud MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:00-15:20

MS Presentation

An Efficient Interpolation Based Fast Multipole Method for Dislocation Dynamics Simulations, Arnaud Etcheverry (INRIA, France)

Co-Authors: Arnaud Etcheverry (INRIA, France); Olivier Coulaud (INRIA, France); Laurent Dupuy (CEA, France); Eric Darve (Stanford University, United States of America)

Although the framework of Dislocation Dynamics (DD) provides powerful tools to model crystal plasticity, their efficient implementation is crucial in order to simulate very large ensembles of dislocations. Among all the steps involved in DD simulations, the computation of the internal elastic forces and energy are the most resource consuming. However, since they are long-ranged interactions, they can be efficiently evaluated using the Fast Multipole Method (FMM). We propose a new FMM formulation based on polynomial interpolations, that is optimised to reduce the memory footprint and the number of flops using fast Fourier transforms. These optimisations are necessary because of the tensorial nature of the kernel and the unusual memory requirements of this application. Regarding parallelism, our code benefits from a hybrid OpenMP/MPI paradigm and a cache-aware data structure. Numerical results will be presented to show the accuracy of this new approach and its parallel scalability.

É

Éthier Stéphane MS Presentation
Wednesday, June 8, 2016
Garden 3C, 13:30-14:00

MS Presentation

Gyrokinetic Particle-in-Cell Codes at Exascale: Challenges and Opportunities, Stéphane Éthier (Princeton Plasma Physics Lab, United States of America)

Co-Authors:

Particle-in-Cell (PIC) codes will be among the first applications to run at the exascale due to their high scalability and excellent data locality. This is especially true for relativistic PIC codes that simulate very fast phenomena where fields are time-advanced along with the particles. However, the time scale covered by these simulations is extremely short and of limited usefulness for magnetic fusion research. The gyrokinetic approximation of the Valsov equation, on the other hand, allows for much larger time scales by mathematically integrating out the fast cyclotron motion of ions in magnetized plasmas. Global gyrokinetic PIC codes, which implement this method, have been very successful but also more challenging to scale than their "classical" counterparts due to the gyrokinetic formulation and the need to solve a Poisson equation. This talk describes several computational schemes that have allowed the GK-PIC codes to overcome some of these issues and scale to petascale.

F

Falcone Jean-Luc MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 15:30-16:00

MS Presentation

Heterogeneous Computations on HPC Infrastructures: Theoretical Framework, Jean-Luc Falcone (University of Geneva, Switzerland)

Co-Authors: Jean-Luc Falcone (University of Geneva, Switzerland)

During last decades, computational scientists have experienced a sustained and continuous increase in models complexity. Current models are not only more detailed and accurate but also span across multiple scales and scientific domains. Instead of writing complicated and monolithic models ex novo, we have explored the coupling of existing single-scale and single-science applications in order to produce multi-scale and multi-science models. We have proposed a theoretical formalism which allows to describe how submodels are coupled (Multiscale Modeling Language - MML), as well as a coupling library (MUSCLE) which allows to build arbitrary workflow from the submodels. Currently, we are exploring the execution of such model across several computing resources, in order to increase available CPU power. In order to properly deploy an execution across several clusters we have developed a discrete event simulator able to predict the relevance of a given an allocation scheme.
Fatica Massimiliano MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Massimiliano Fatica (NVIDIA, United States of America)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Faustin Jonathan Contributed Talk
Thursday, June 9, 2016
Garden 3A, 11:10-11:30

Contributed Talk

Self-Consistent Modelling of Plasma Heating and Fast Ion Generation Using Ion-Cyclotron Range of Frequency Waves in 2D and 3D Devices, Jonathan Faustin (EPFL-SPC, Switzerland)

Co-Authors: Wilfred Cooper (EPFL, Switzerland); Jonathan Graves (EPFL, Switzerland); David Pfefferlé (EPFL, Switzerland); Joachim Geiger (Max Planck Institute of Plasma Physics, Germany)

Ion-Cyclotron Range of Frequency (ICRF) waves is an efficient source of plasma heating in tokamaks and stellarators. In ICRF heated plasmas, the resonating particles phase-space distribution function displays significant distortion. A significant consequence is to modify noticeably the plasma properties which dictates the propagation of the ICRF wave. The self-consistent modelling tool SCENIC was built in order to solve this highly non-linear problem. It is one of the few ICRF modelling tools able to tackle both 2D and 3D plasma configurations. The computational resources, in particular the amount of shared memory required to resolve the plasma equilibrium and the wave propagation, significantly increases for simulations of strongly 3D equilibrium such as stellarators compared to 2D tokamaks calculation. We present some applications of SCENIC to tokamak and stellarator plasmas. Particular focus is given to simulations of the recenlty started Wendelstein7-X stellarator experiment which will use ICRF waves for fast particle generation.
Favre Jean M. MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:00-14:30

MS Presentation

Open-Source Visualization Based on VTK: Application Case Studies, Jean M. Favre (CSCS, Switzerland)

Co-Authors:

HPC systems offer tremendous computing resources, often difficult to harness for Visualization applications. In fact, even though not often "massively" parallel, these applications can stress the system in numerous ways, while trying to execute in parallel on ALL subsystems (disk I/O, CPU, and GPU rendering). We will talk about VTK, an unarguably leading open-source Visualization effort which has been able to spread very early to the HPC world, and how it is today, embracing all new forms of parallelism, and advanced computing and data analysis methods and rendering. Through two end-user applications, VisIt and ParaView, we will give examples of scientific visualizations, and talk about the impact of VTK's latest release (its seventh major release) in the HPC world.
Ferretti Marco MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 16:30-17:00

MS Presentation

Configuration, Profiling and Tuning of a Complex Biomedical Application: Analysis of ISA Extensions for Floating Point Processing, Marco Ferretti (University of Pavia, Italy, Italy)

Co-Authors: Mirto Musci (University of Pavia, Italy)

This contribution describes an instance of the practitioner approach to application tuning for HPC deploy, starting on a small-size server and applying a black-box approach to performance tuning. The bio-medical iCardioCloud project aims at establishing a computational framework to perform a complete patient-specific numerical analysis specially oriented to aortic diseases. The application is based on CFD. The work here reported shows the SE basic approach to tuning within a black-box approach: optimisation of the build process, analysis of module dependencies, assessment of possible compiler-based optimisations, final targeting on a specific architecture, generation of various scripts for optimised builds. The experience gained with this study case shows to which extent a skilled IT professional is actually required in a domain specific application to move the process from a naive to an advanced solution.
Fichtner Andreas MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:45-15:00

MS Presentation

Salvus: A Flexible Open-Source Package for Full-Waveform Modelling and Inversion, Andreas Fichtner (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Martin van Driel (ETH Zurich, Switzerland); Lion Krischer (Ludwig Maximilian University of Munich, Germany); Dave A. May (ETH Zurich, Switzerland); Max Rietmann (ETH Zurich, Switzerland); Korbinian Sager (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Within all domain-specific software projects, finding the correct balance between flexibility and performance is often difficult. In the seismic imaging community, the trend has been to move towards codes which are heavily optimised, but which often sacrifice usability and flexibility. Here we introduce Salvus: an open-source HPC high-order finite element (FE) package focused on full-waveform modelling and inversion, which is designed to be both flexible and performant. Salvus was constructed by following modern software design practices, testing protocols, and by establishing its foundations upon existing open-source high-level scientific libraries. The FE framework is generalized over spatial dimensions, time-integrators, polynomial order and wave-propagation physics, and provides support for both hexahedral and tetrahedral meshes. Additionally, support is included for various numerical optimisation methods. We discuss our usage of existing open-source scientific libraries, our choice level of abstraction, and quantitatively investigate the performance penalties associated with these abstractions.
Paper
Wednesday, June 8, 2016
Auditorium C, 13:00-13:30

Paper

Automatic Global Multiscale Seismic Inversion: Insights into Model, Data, and Workflow Management, Andreas Fichtner (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Alexey Gokhberg (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Modern global seismic waveform tomography is formulated as a PDE-constrained nonlinear optimization problem, where the optimization variables are Earth's visco-elastic parameters. This particular problem has several defining characteristics. First, the solution to the forward problem, which involves the numerical solution of the elastic wave equation over continental to global scales, is computationally expensive. Second, the determinedness of the inverse problem varies dramatically as a function of data coverage. This is chiefly due to the uneven distribution of earthquake sources and seismometers, which in turn results in an uneven sampling of the parameter space. Third, the seismic wavefield depends nonlinearly on the Earth's structure. Sections of a seismogram which are close in time may be sensitive to structure greatly separated in space.

In addition to these theoretical difficulties, the seismic imaging community faces additional issues which are common across HPC applications. These include the storage of massive checkpoint files, the recovery from generic system failures, and the management of complex workflows, among others. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these memory- and manpower-bounded issues.

We present a two-tiered solution to the above problems. To deal with the problems relating to computational expense, data coverage, and the increasing nonlinearity of waveform tomography with scale, we present the Collaborative Seismic Earth Model (CSEM). This model, and its associated framework, takes an open-source approach to global-scale seismic inversion. Instead of attempting to monolithically invert all available seismic data, the CSEM approach focuses on the inversion of specific geographic subregions, and then consistently integrates these subregions via a common computational framework. To deal with the workflow and storage issues, we present a suite of workflow management software, along with a custom designed optimization and data compression library. It is the goal of this paper to synthesize these above concepts, originally developed in isolation, into components of an automatic global-scale seismic inversion.
Filbet Francis MS Presentation
Wednesday, June 8, 2016
Garden 3C, 15:30-16:00

MS Presentation

Asymptotically Stable Particle-in-Cell Methods for the Vlasov-Poisson System with a Strong External Magnetic Field, Francis Filbet (university Toulouse, France)

Co-Authors:

This work deals with the numerical resolution of the Vlasov-Poisson system with a strong external magnetic field by Particle-In-Cell (PIC) methods. In this regime, classical PIC methods are subject to stability constraints on the time and space steps related to the small Larmor radius and plasma frequency. Here, we propose an asymptotic-preserving PIC scheme which is not subjected to these limitations. Our approach is based on first and higher order semi-implicit numerical schemes already validated on dissipative systems. Additionally, when the magnitude of the external magnetic field becomes large, this method provides a consistent PIC discretization of the guiding-center equation, that is, incompressible Euler equation in vorticity form. We propose several numerical experiments which provide a solid validation of the method and its underlying concepts.
Fillot Nicolas MS Presentation

, -

MS Presentation

On the use of thermal boundary conditions in a lubricated contact multiscale framework, Nicolas Fillot (INSA Lyon, France)

Co-Authors:

Numerical simulations of lubricated contacts (as in roller bearings, came-follower systems, gears) will be discussed. Finite-Element models can reach a quantitative prediction of friction and lubricant film thickness, as soon as heat exchanges are properly described both in the lubricant and in a sufficiently large portion of the bounding solids, whose heat diffusion can have drastic consequence on both friction and wear. But temperature should be fixed at some places (where?, which value?). We will show that a multiscale analysis, taking into account the whole mechanism surrounding the contact, would be required. The other way around, a local description of molecules flowing between the surfaces (e.g., using molecular dynamics simulations) would require the thermostating of at least a part of the molecular domain. How to impose temperature in this tiny space without disrupting the flow? Again this multiscale thermal problem in tribology should be solved by an adequate modelling.
MS Presentation
Friday, June 10, 2016
Garden 2A, 09:00-09:30

MS Presentation

On the Use of Thermal Boundary Conditions in a Lubricated Contact Multiscale Framework, Nicolas Fillot (INSA Lyon, France)

Co-Authors:

Numerical simulations of lubricated contacts (as in roller bearings, came-follower systems, gears) will be discussed. Finite-Element models can reach a quantitative prediction of friction and lubricant film thickness, as soon as heat exchanges are properly described both in the lubricant and in a sufficiently large portion of the bounding solids, whose heat diffusion can have drastic consequence on both friction and wear. But temperature should be fixed at some places (where?, which value?). We will show that a multiscale analysis, taking into account the whole mechanism surrounding the contact, would be required. The other way around, a local description of molecules flowing between the surfaces (e.g. using molecular dynamics simulations) would require the thermostating of at least a part of the molecular domain. How to impose temperature in this tiny space without disrupting the flow? Again this multiscale thermal problem in tribology should be solved by an adequate modelling.
Firoz Jesun Sahariar Paper
Thursday, June 9, 2016
Auditorium C, 12:00-12:30

Paper

Context Matters: Distributed Graph Algorithms and Runtime Systems, Jesun Sahariar Firoz (Indiana University, United States of America)

Co-Authors: Jesun Sahariar Firoz (Indiana University, United States of America); Thejaka Amila Kanewala (Indiana University, United States of America); Marcin Zalewski (Indiana University, United States of America); Martina Barnas (Indiana University, United States of America)

The increasing complexity of the software/hardware stack of modern supercomputers makes understanding the performance of the modern massive-scale codes difficult. Distributed graph algorithms (DGAs) are at the forefront of that complexity, pushing the envelope with their massive irregularity and data dependency. We analyse the existing body of research on DGAs to assess how technical contributions are linked to experimental performance results in the field. We distinguish algorithm-level contributions related to graph problems from "runtime-level" concerns related to communication, scheduling, and other low-level features necessary to make distributed algorithms work. We show that the runtime is an integral part of DGAs' experimental results, but it is often ignored by the authors in favor of algorithm-level contributions. We argue that a DGA can only be fully understood as a combination of these two aspects and that detailed reporting of runtime details must become an integral part of scientific standard in the field if results are to be truly understandable and interpretable. Based on our analysis of the field, we provide a template for reporting the runtime details of DGA results, and we further motivate the importance of these details by discussing in detail how seemingly minor runtime changes can make or break a DGA.
Fisher Mike MS Presentation
Wednesday, June 8, 2016
Garden 3B, 17:00-17:30

MS Presentation

Numerical Solution of the Time-Parallelized Weak-Constraint 4DVAR, Mike Fisher (ECMWF, United Kingdom)

Co-Authors: Serge Gratton (Institut de Recherche en Informatique de Toulouse / CERFACS, France); Mike Fisher (ECMWF, United Kingdom)

This study will address the numerical solution of the saddle point system arising from four dimensional variational (4D-Var) data assimilation, including a study of preconditioning and its convergence properties. This saddle point formulation of 4D-Var allows parallelization in time dimension. Therefore, it represents a crucial step towards higher computational efficiency, since 4D-Var approaches otherwise require many sequential computations. In recent years, there has been increasing interest in saddle point problems which arise in many other applications such as constrained optimisation, computational fluid dynamics, optimal control and so forth. The key issue of solving saddle point systems with Krylov subspace methods is to find efficient preconditioners. This work focuses on the preconditioners obtained by using limited memory low-rank updates and presents numerical results obtained from the Object Oriented Prediction System (OOPS) developed by ECMWF.
MS Presentation
Wednesday, June 8, 2016
Garden 3B, 15:30-16:00

MS Presentation

Improving the Scalability of 4D-Var with a Weak Constraint Formulation, Mike Fisher (ECMWF, United Kingdom)

Co-Authors: Mike Fisher (ECMWF, United Kingdom)

In 4D-Var, the forecast model is used to propagate the initial state to the time of the observations and is assumed to be perfect. As most aspects of the data assimilation system have improved over the years, this assumption becomes less realistic. There are theoretical benefits in using a weak-constraint formulation and a long assimilation window in 4D-Var and recent experiments have shown benefits in using overlapping assimilation windows even with strong constraint 4D-Var. The weak constraint formulation writes the optimisation problem as a function of the four dimensional state over the length of the assimilation window. In addition to its theoretical advantages, it increases the potential for parallelisation and better scalability. Using a saddle point method make it possible to take full advantage of the potential for additional parallelism. We will show how it can benefit future operational systems and reduce the time to solution in the critical path.
Fisicaro Giuseppe Poster

Poster

MAT-03 Complex Wet-Environments in Electronic-Structure Calculations, Giuseppe Fisicaro (Department of Physics - University of Basel, Switzerland)

Co-Authors: Luigi Genovese (CEA/INAC, France); Oliviero Andreussi (Università della Svizzera italiana, Switzerland); Nicola Marzari (EPFL, Switzerland); Stefan Goedecker (University of Basel, Switzerland)

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions at the ab-initio level in the presence of the proper electrochemical environment. In this work we present a continuum solvation library able to handle both neutral and ionic solutions, solving the Generalized Poisson and the Poisson-Boltzmann problem. Two different recipes have been implemented to build up the continuum dielectric cavity (one using atomic coordinates, the other mapping the solute electronic density). A preconditioned conjugate gradient method has been implemented for the Generalized Poisson equation, whilst a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. Both solvers and continuum dielectric cavities have been integrated into the BigDFT electronic-structure package. We benchmarked the whole library on several atomistic systems including small neutral molecules, large proteins, solvated surfaces and reactions in solution to demonstrate efficiency and performances.
Folini Doris Poster

Poster

CLI-03 From Code to Climate: Adjusting Free Parameters in a Global Climate Model, Doris Folini (ETH Zurich, Institute for Atmospheric and Climate Science, Switzerland)

Co-Authors: Martin Wild (ETH Zurich, Switzerland)

The discretization of global climate models (GCMs) is too coarse to resolve a number of climate relevant processes. For example, the deep convection associated with tropical thunderstorms is of key relevance for the global atmospheric circulation, yet it enters GCMs only via sub-grid-scale parameterization of thunderstorms. Typically, such parameterizations come with some free parameters that need adjusting in order to obtain a 'physically meaningful climate', a process referred to as 'model tuning'. We illustrate this process at the example of MPI-ESM-HAM, the Max Planck Earth System Model (MPI-ESM) coupled to the Hamburg Aerosol Module (HAM) and discuss how we cope with three associated computational challenges: the high dimensionality of the parameter space, the substantial year-to-year variability of the model climate as compared to the long term mean climate, and response time scales to changes in tuning parameters that range from under a year to several centuries or longer.
Fomel Sergey MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:30-14:45

MS Presentation

Data-Parallel Processing Using Madagascar Open-Source Software Package, Sergey Fomel (University of Texas at Austin, United States of America)

Co-Authors:

Many data analysis tasks in geophysics are data-parallel: the same process is applied in parallel on different parts of the input dataset. The Madagascar open-source software package provides a number of general-purpose tools to simplify data-parallel processing. These tools allow the user to develop serial code and easily run it in a data-parallel fashion on a multicore computer or a computer cluster. Different tools take advantage of different modes of parallelization (multi-threading, shared memory with OpenMP, distributed memory with MPI, etc.). They employ Unix-style data encapsulation to wrap serial code using forks and pipes. I will describe the current collection of data-parallel tools and will show examples of their usage in solving geophysical data analysis problems. I will also make suggestions for further development inviting input from the open-source community.
Formaggia Luca MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:15-14:30

MS Presentation

Simulation of Fluid-Structure Interaction with a Thick Structure via an Extended Finite Element Approach, Luca Formaggia (Politecnico di Milano, Italy)

Co-Authors: Luca Formaggia (Politecnico di Milano, Italy); Christian Vergara (Politecnico di Milano, Italy)

In this talk, we present an eXtended Finite Element Method (XFEM) to simulate the fluid-structure interaction arising from a 3D flexible thick structure immersed in a fluid. Both fluid and solid domains are discretized independently by generating two overlapped unstructured meshes. Due to the unfitted nature of the considered meshes, this method avoids the technical problems related to an ALE approach while maintaining an accurate description of the fluid-structure interface. The coupling between the fluid and solid is taken into account by means of a Discontinuous Galerkin approach, which allows to impose the interface conditions. A possible application is the study of the interaction arising between blood and aortic valve leaflets since it is important for understanding its functional behaviour, for developing prosthetic valve devices and for post-surgery feedbacks.
Fornari Marco MS Presentation
Thursday, June 9, 2016
Garden 1BC, 14:00-14:30

MS Presentation

Descriptors that Work: Taming Complexity a Piece at a Time, Marco Fornari (Central Michigan University, United States of America)

Co-Authors:

First principles methodologies have grown in accuracy and applicability to the point where large databases can be built and analysed to predict novel compositions.The discovery process is rooted on the knowledge of descriptors: quantities that link selected microscopic features of the materials to macroscopic functional properties. To be useful, descriptors should be as simple as possible and, at the same time, reliable and transferable across chemical compositions and structural themes. Within the AFLOW consortium we have developed a simple frame to expand, validate, and mine data repositories: the MTFrame. Our minimalistic approach complement AFLOW and other existing high-throughput infrastructures and aims to integrate data generation with data analysis. I will present present examples from our work on novel materials for electromechanical energy conversion, thermoelectrics, and transparent conductors. My intent is to pinpoint the usefulness of our descriptors to guide the discovery process and quantitatively structuring the scientific intuition.
Forti Davide MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:30-16:45

MS Presentation

Coupled Mathematical and Numerical Models for Integrated Simulations of the Left Ventricle, Davide Forti (EPFL, Switzerland)

Co-Authors: Luca Dede' (EPFL, Switzerland); Davide Forti (EPFL, Switzerland); Alfio Quarteroni (EPFL, Switzerland)

In this talk, we focus on the coupling of electrophysiology and mechanical models to realize an integrated model of the left ventricle by considering the active contraction of the muscle and the feedback on the electrophysiology. For the latter, we consider the mono-domain equations with the Bueno-Orovio ionic model. As for the mechanics, we consider the Holzapfl-Ogden model together with an active strain approach with a transmurally variable activation parameter. We spatially approximate the model by means of the Finite Element method and discuss the properties of different coupling strategies and time discretization schemes. Among these, we consider a fully coupled strategy with a semi-implicit scheme for the time discretization. In order to solve the linear system arising from such discretization, we use a preconditioner based on the FaCSI (Factorized Condensed SIMPLE) concept. We present and discuss numerical results obtained in the HPC framework, including patient-specific left ventricle geometries.
MS Presentation
Friday, June 10, 2016
Garden 3C, 10:00-10:30

MS Presentation

Fluid-Structure Interaction for Vascular Flows: From Supercomputers to Laptops, Davide Forti (EPFL, Switzerland)

Co-Authors: Claudia Colciago (EPFL, Switzerland); Davide Forti (EPFL, Switzerland)

Can we simulate haemodynamics in a vascular district in real time? One single heartbeat still takes several hours on an HPC platform, how can we reduce the computational complexity of 2-3 orders of magnitude? The key ingredients are model order reduction and numerical reduction combined with pre-processing on supercomputers. Blood flow in arteries needs to take into account the incompressibility of the fluid, the compliant vessel, and the patient specific data. After reducing the complexity of the model,i.e. by assuming a fixed fluid computational domain and a thin membrane structure, it is possible to use Proper Orthogonal Decomposition and Reduced Basis Method to split the computational effort into an offline and an online parts. The former runs on a HPC system in 5 hours on 1000 processors, while the latter runs in real time, i.e. 1 second of simulations in less than 1 second of CPU time, on a notebook.
Franc Alain Poster

Poster

CSM-08 Fast Randomized Algorithms for Covariance Matrix Computations, Alain Franc (INRIA, France)

Co-Authors: Olivier Coulaud (INRIA, France); Eric Darve (Stanford University, United States of America); Alain Franc (INRIA, France)

Covariance matrices arise in many fields of modern scientific computation from geostatistics to data analysis, where they usually measure the correlation between grid points. Most algorithms involving such matrices have a superlinear cost in N, the size of the grid. We present an open-source library implementing efficient algorithms based on randomized low-rank approximations (LRA). The library can provide approximate decompositions of low-rank covariance matrices in O(N^2) operations instead of the usual O(N^3) cost of standard methods. In addition, low-rank covariance matrices given as kernels, e.g., Gaussian functions, and evaluated on 3D grids can be factorized in O(N) operations using randomized LRA and an FMM acceleration. The performance of the library is illustrated on two examples: the generation of Gaussian Random Fields on large O(10^6) points heterogeneous spatial grids, and the computation of reduced-order maps given distances between DNA sequences using Multi-Dimensional Scaling for the classification of species on 10^5 samples.
Fuhrer Oliver MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:15-15:30

MS Presentation

Refactoring and Virtualizing a Mesoscale Model for GPUs, Oliver Fuhrer (MeteoSwiss, Switzerland)

Co-Authors: Andrea Arteaga (MeteoSwiss, Switzerland); Christophe Charpilloz (MeteoSwiss, Switzerland); Salvatore Di Girolamo (ETH Zurich, Switzerland); Torsten Hoefler (ETH Zurich, Switzerland)

Our aim is to adopt the COSMO limited-area model to enable kilometer-scale resolution in climate simulation mode. As the resolution of climate simulations increases, storing the large amount of generated data becomes infeasible. To enable high-resolution models, we find a good compromise between the disk I/O costs and the need to access the output data for post-processing and analysis. We propose a data-virtualization layer that re-runs simulations on demand and transparently manages the data for the analytics applications. To achieve this goal, we developed a bit-reproducible version of the dynamical core of the COSMO model that runs on different architectures (e.g., CPUs and GPUs). An ongoing project is working on the reproducibility of the full COSMO code. We will discuss the strategies adopted to develop the data virtualization layer, the challenges associated with the reproducibility of simulations performed on different hardware architectures and the first promising results of our project.
MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:30-14:00

MS Presentation

Translating Python into GridTools: Prototyping PDE Solvers Using Stencils, Oliver Fuhrer (MeteoSwiss, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

The fast-paced environment of high-performance computing architectures has always been a challenge for complex codes. First, the effort to adapt the code to new processor architectures is significant compared to their typical release phase. Second, optimisations for one target often incur performance penalties on others. Third, such codes are generally developed by domain scientists, which typically lack the expertise about specific details of the target platform. Successful projects like STELLA have shown that a way out of this situation is to apply the concept of separation of concerns. GridTools is pushing this concept even further: The domain scientist's work is conducted within a prototyping environment using a domain-specific language (DSL), while the computer scientist profiles the automatically-generated code over diverse architectures, implemented by different hardware-specific backends. This talk will give an overview of the GridTools ecosystem, highlighting the use of the prototyping environment in combination with the automatic-code generation engine.
MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, Oliver Fuhrer (MeteoSwiss, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.
Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 12:10-12:30

Contributed Talk

The GridTools Libraries for the Solution of PDEs Using Stencils, Oliver Fuhrer (MeteoSwiss, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

Numerical weather prediction and climate models like COSMO and ICON solve explicitly a large set of PDEs. The STELLA library was successfully used to port the dynamical core of COSMO providing a performance portable code across multiple platforms. A significant performance speedup was obtained for NVIDIA GPUs as reported in doi>10.1145/2807591.2807676. However its applicability was restricted to only cartesian structured grids, finite difference methods, and is difficult to be used outside the COSMO model. The GridTools project emerges as an effort to provide an ecosystem for developing portable and efficient grid applications for the explicit solution of PDEs. GridTools generalizes STELLA to a wider class of weather and climate models on multiple grids: Cartesian and spherical, and offers facilities for performing communication and setting boundary conditions. Here we present the GridTools API and show performance on NVIDIA GPUs and x86 platforms.

G

Gabriel Alice-Agnes Poster

Poster

EAR-01 Coupling Geodynamic Seismic Cycle and Dynamic Rupture Models, Alice-Agnes Gabriel (LMU Munich, Germany)

Co-Authors: Ylona van Dinther (ETH Zurich, Switzerland); Alice-Agnes Gabriel (Ludwig Maximilian University of Munich, Germany)

Diverse modelling techniques that span large spatial and temporal scales are required to study the seismicity in subduction zones. Our seismo-thermo-mechanical (STM) seismic cycle models solve million year scale subduction dynamics and multiple earthquake events self-consistently, but fail to resolve the finer seismic time scale at which dynamic rupture models excel. By using the self-consistent stresses and strengths of our STM model as input for dynamic rupture scenarios conducted with SeisSol, the otherwise hard-to-constrain assumptions on these fields are resolved and advantages of both methods are exploited. The results show that a dynamic rupture can be triggered spontaneously and that the propagating rupture is qualitatively comparable to its quasi-static equivalent. The importance of both self-consistent initial conditions and dynamic feedback on fault strength is illustrated by a quantitative comparison of surface displacements and stresses.
Gagliardini Patrick MS Presentation
Thursday, June 9, 2016
Garden 2A, 14:30-15:00

MS Presentation

Causality Inference in a Nonstationary and Nonhomogenous Framework, Patrick Gagliardini (Universita della Svizzera italiana, Lugano, Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland); Lukas Pospisil (Università della Svizzera italiana, Switzerland)

The project deploys statistical and computational techniques to develop a novel approach to causality inference in multivariate time-series of economical data on equity and credit risks. The methods build on recent research of project participants. They improve on classical approaches to causality analysis by accommodating general forms of non-stationarity and non-homogeneity resulting from unresolved and latent scale effects. Emerging causality framework results in and is implemented through a clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. We are using finite element framework to propose a numerical scheme. One of the most challenging components of the emerging HPC implementation is a Quadratic Programming problem with linear equality and bound inequality constraints. We compare different algorithms and demonstrate the efficiency solving practical benchmark problems.
Poster

Poster

CSM-13 Towards the HPC-Inference of Causality Networks from Multiscale Economical Data, Patrick Gagliardini (Universita della Svizzera italiana, Lugano, Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); Patrick Gagliardini (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland)

The novel non-stationary approach to causality inference of multivariate time-series was proposed during the recent research of project participants. This methodology uses the clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. For analysis of realistic datasets we develop HPC library that is built on top of PETSc and that implements MPI, OpenMP, and CUDA parallelization strategies. We present the mathematical aspects of the methodology and preliminary results of solving the non-stationary problem of causality inference for multivariate economic data with our HPC approach. The results are computed on Piz Daint at CSCS.
Gallaire François MS Summary

MS Summary

MS05 High-Performance Computing in Fluid Mechanics I, François Gallaire (EPFL _ Ecole polytechnique fédérale de Lausanne, Switzerland)

Co-Authors: Francois Gallaire (EPFL, Switzerland)

Large-scale computer simulations have become an indispensable tool in fluid dynamics research. Elucidating the fundamental flow physics or designing and optimizing flows for applications ranging all the way from low Reynolds number multiphase flows at small length scales to fully developed turbulence at large scales requires state-of-the art simulation capabilities. The need for large-scale HPC simulations in fluids dynamics has therefore been driving the development of novel simulation algorithms and of optimized software infrastructure. The envisioned PASC 16 symposium will bring together both developers and users of modern simulation tools. The set of talks will showcase a wide range of fields in which HPC computations are essential, highlight recent advances in computational methods and provide a platform to identify challenges and opportunities for future research.
MS Summary

MS Summary

MS14 High-Performance Computing in Fluid Mechanics II, François Gallaire (EPFL _ Ecole polytechnique fédérale de Lausanne, Switzerland)

Co-Authors: Francois Gallaire (EPFL, Switzerland)

Large-scale computer simulations have become an indispensable tool in fluid dynamics research. Elucidating the fundamental flow physics or designing and optimizing flows for applications ranging all the way from low Reynolds number multiphase flows at small length scales to fully developed turbulence at large scales requires state-of-the art simulation capabilities. The need for large-scale HPC simulations in fluids dynamics has therefore been driving the development of novel simulation algorithms and of optimized software infrastructure. The envisioned PASC 16 symposium will bring together both developers and users of modern simulation tools. The set of talks will showcase a wide range of fields in which HPC computations are essential, highlight recent advances in computational methods and provide a platform to identify challenges and opportunities for future research.
Gandhi Tejas Poster

Poster

LS-02 Generating Very Large Spectral Libraries for Targeted Proteomics Analysis Using Spectronaut, Tejas Gandhi (Biognosys, Switzerland)

Co-Authors: Roland Bruderer (Biognosys AG, Switzerland); Oliver Martin Bernhardt (Biognosys AG, Switzerland); Lukas Reiter (Biognosys AG, Switzerland)

Mass spectrometer (MS) based data-independent acquisition with targeted analysis offers new possibilities for highly multiplexed peptide and protein quantification experiments. This type of analysis often includes a spectral library as a prerequisite. In layman's terms, a spectral library is a collection of fingerprints that facilitates the identification of signals measured by the MS. Both the size and the quality of a spectral library, acting as a template for these target signals, can make a significant difference in the quality of the data analysis. Recently, the trend has been moving towards generating very large spectral libraries consisting of hundreds of thousands of peptides stemming from tens of thousands of proteins. From a software engineering perspective, the challenge then is to process and manage such large libraries in an efficient manner. Here we present our solution towards generating very large spectral libraries while using a standard gaming workstation.
MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 17:00-17:15

MS Presentation

Working with Limited Resources: Large-Scale Proteomic Data-Analysis on Cheap Gaming Desktop PCs, Tejas Gandhi (Biognosys, Switzerland)

Co-Authors: Lukas Reiter (Biognosys AG, Switzerland); Tejas Gandhi (Biognosys AG, Switzerland); Roland Bruderer (Biognosys AG, Switzerland)

One of the major challenges in mass-spec driven proteomics research is data-analysis. Many research facilities have the capacity to generate several gigabytes of data per hour. To process such data though, software solutions for high-throughput data analysis often require a cluster computing infrastructure. Since many research facilities do not have the required IT infrastructure for large-scale data-processing, this kind of proteomics research was restricted to only a few proteomics groups. Here we present a software solution that is capable of processing terabytes of large proteomics experiments on a cheap desktop gaming PC setup. We will focus on how to overcome the issue of limited resources while still maintaining a high-throughput data-analysis and reasonable scalability.
Gantner Robert N. Paper
Thursday, June 9, 2016
Auditorium C, 11:30-12:00

Paper

A Generic C++ Library for Multilevel Quasi-Monte Carlo, Robert N. Gantner (ETH Zurich, Switzerland)

Co-Authors:

We present a high-performance framework for single- and multilevel quasi-Monte Carlo methods applicable to a large class of problems in uncertainty quantification. A typical application of interest is the approximation of expectations of quantities depending on solutions to e.g. partial differential equations with uncertain inputs, often yielding integrals over high-dimensional spaces. The goal of the software presented in this paper is to allow recently developed quasi-Monte Carlo (QMC) methods with high, dimension-independent orders of convergence to be combined with a multilevel approach and applied to large problems in a generic way, easing distributed memory parallel implementation. The well-known multilevel Monte Carlo method is also supported as a particular case, and some standard choices of distributions are included. For so-called interlaced polynomial lattice rules, a recently developed QMC method, precomputed generating vectors are required; some such vectors are provided for common cases.

After a theoretical introduction, the implementation is briefly explained and a user guide is given to aid in applying the framework to a concrete problem. We give two examples: a simple model problem designed to illustrate the mathematical concepts as well as the advantages of the multilevel method, including a measured decrease in computational work of multiple orders of magnitude for engineering accuracy, and a larger problem that exploits high-performance computing to show excellent parallel scaling. Some concrete use cases and applications are mentioned, including uncertainty quantification of partial differential equation models, and the approximation of solutions of corresponding Bayesian inverse problems. This software framework easily admits modifications; custom features like schedulers and load balancers can be implemented without hassle, and the code documentation includes relevant details.
Garten Marco MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, TU Dresden, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Gasparotto Piero Poster

Poster

MAT-08 Probing Defects and Correlations in the Hydrogen-Bond Network of Ab Initio Water, Piero Gasparotto (École polytechnique fédérale de LausanneEPFL, Switzerland)

Co-Authors:

Many of the unique properties of water are due to its highly-structured hydrogen-bond (HB) network, in which a dominant tetrahedral motif coexists with a plethora of different coordination defects. Due to structural constraints, coordination defects do not come alone, but clustered together. Here, we compute defect-resolved distribution functions from ab initio molecular dynamics to probe the radial and angular correlation between defects. We shed light on how fluctuations from the ideal tetrahedral structure contribute to the total radial distribution function of liquid water. We also present a systematic comparison of the concentration of different defects and the structural correlations between them, with a variety of simulation protocols. We show that regardless the details of the model, the qualitative predictions for the defect distributions are very similar. The most significant effect can be attributed to dispersion interactions, that impact the most on the relative populations of the various defects.
Gehant Sebastien Poster

Poster

LS-08 The UniProt SPARQL Endpoint: 21 Billion Triples in Production, Sebastien Gehant (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Sebastien Gehant (Swiss Institute of Bioinformatics, Switzerland); Thierry Lombardot (Swiss Institute of Bioinformatics, Switzerland); Lydie Bougueleret (Swiss Institute of Bioinformatics, Switzerland); Ioannis Xenarios (Swiss Institute of Bioinformatics, Switzerland); Nicole Redaschi (Swiss Institute of Bioinformatics, Switzerland)

The UniProt knowledgebase is a leading resource of protein sequences and functional information whose centerpiece is the expert-curated Swiss-Prot section. UniProt data is accessible at www.uniprot.org (via a user-friendly interface and a REST API) and at sparql.uniprot.org, a public SPARQL endpoint hosted and maintained by the Vital-IT and Swiss-Prot groups of SIB. With 21 billion RDF triples it is the largest free to use graph database in the sciences. SPARQL allows scientists to perform complex queries within UniProt and across datasets located on remote SPARQL endpoints. It provides a free data integration solution for users who cannot afford to create custom data warehouses, at a cost for the service providers. Here we discuss the challenges in maintaining the UniProt SPARQL endpoint, which is updated monthly in sync with the UniProt data releases.
Geiger Joachim Contributed Talk
Thursday, June 9, 2016
Garden 3A, 11:10-11:30

Contributed Talk

Self-Consistent Modelling of Plasma Heating and Fast Ion Generation Using Ion-Cyclotron Range of Frequency Waves in 2D and 3D Devices, Joachim Geiger (IPP, Germany)

Co-Authors: Wilfred Cooper (EPFL, Switzerland); Jonathan Graves (EPFL, Switzerland); David Pfefferlé (EPFL, Switzerland); Joachim Geiger (Max Planck Institute of Plasma Physics, Germany)

Ion-Cyclotron Range of Frequency (ICRF) waves is an efficient source of plasma heating in tokamaks and stellarators. In ICRF heated plasmas, the resonating particles phase-space distribution function displays significant distortion. A significant consequence is to modify noticeably the plasma properties which dictates the propagation of the ICRF wave. The self-consistent modelling tool SCENIC was built in order to solve this highly non-linear problem. It is one of the few ICRF modelling tools able to tackle both 2D and 3D plasma configurations. The computational resources, in particular the amount of shared memory required to resolve the plasma equilibrium and the wave propagation, significantly increases for simulations of strongly 3D equilibrium such as stellarators compared to 2D tokamaks calculation. We present some applications of SCENIC to tokamak and stellarator plasmas. Particular focus is given to simulations of the recenlty started Wendelstein7-X stellarator experiment which will use ICRF waves for fast particle generation.
Gendron Eric Paper
Thursday, June 9, 2016
Auditorium C, 10:30-11:00

Paper

Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs, Eric Gendron (L'Observatoire de Paris, France)

Co-Authors: Hatem Ltaief (KAUST, Saudi Arabia), Damien Gratadour (L'Observatoire de Paris, France); Eric Gendron (L'Observatoire de Paris, France)

We present a high performance comprehensive implementation of a multi-object adaptive optics (MOAO) simulation on multicore architectures with hardware accelerators in the context of computational astronomy. This implementation will be used as an operational testbed for simulating the design of new instruments for the European Extremely Large Telescope project (E-ELT), the world's biggest eye and one of Europe's highest priorities in ground-based astronomy. The simulation corresponds to a multi-step multi-stage procedure, which is fed, near real-time, by system and turbulence data coming from the telescope environment. Based on the PLASMA library powered by the OmpSs dynamic runtime system, our implementation relies on a task-based programming model to permit an asynchronous out-of-order execution. Using modern multicore architectures associated with the enormous computing power of GPUs, the resulting data-driven compute-intensive simulation of the entire MOAO application, composed of the tomographic reconstructor and the observing sequence, is capable of coping with the aforementioned real-time challenge and stands as a reference implementation for the computational astronomy community.
Genovese Luigi MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 14:20-14:40

MS Presentation

BigDFT: Flexible DFT Approach to Large Systems Using Adaptive and Localized Basis Functions, Luigi Genovese (CEA/INAC, France)

Co-Authors: Luigi Genovese (CEA/INAC, France); Stefan Mohr (BSC, Spain); Laura Ratcliff (Argonne National Laboratory, United States of America); Stefan Goedecker (University of Basel, Switzerland)

Since 2008, the BigDFT project consortium has developed an ab initio DFT code based on Daubechies wavelets. In recent articles, we presented the linear scaling version of BigDFT code[1], where a minimal set of localized support functions is optimised in situ for systems in various boundary conditions. We will present how the flexibility of this approach is helpful in providing a basis set that is optimally tuned to the chemical environment surrounding each atom. In addition than providing a basis useful to project Kohn-Sham orbitals informations like atomic charges and partial density of states, it can also be reused as-is, without re-optimisation, for charge-constrained DFT calculations within a fragment approach[2]. We will demonstrate the interest of this approach to express highly precise and efficient calculations of systems in complex environments[3]. [1] JCP 140, 204110 (2014), PCCP 17, 31360 (2015) [2] JCP 142, 23, 234105 (2015) [3] JCTC 11, 2077 (2015)
Poster

Poster

MAT-03 Complex Wet-Environments in Electronic-Structure Calculations, Luigi Genovese (CEA/INAC, France)

Co-Authors: Luigi Genovese (CEA/INAC, France); Oliviero Andreussi (Università della Svizzera italiana, Switzerland); Nicola Marzari (EPFL, Switzerland); Stefan Goedecker (University of Basel, Switzerland)

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions at the ab-initio level in the presence of the proper electrochemical environment. In this work we present a continuum solvation library able to handle both neutral and ionic solutions, solving the Generalized Poisson and the Poisson-Boltzmann problem. Two different recipes have been implemented to build up the continuum dielectric cavity (one using atomic coordinates, the other mapping the solute electronic density). A preconditioned conjugate gradient method has been implemented for the Generalized Poisson equation, whilst a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. Both solvers and continuum dielectric cavities have been integrated into the BigDFT electronic-structure package. We benchmarked the whole library on several atomistic systems including small neutral molecules, large proteins, solvated surfaces and reactions in solution to demonstrate efficiency and performances.
Gerbi Antonello MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:30-16:45

MS Presentation

Coupled Mathematical and Numerical Models for Integrated Simulations of the Left Ventricle, Antonello Gerbi (EPFL, Switzerland)

Co-Authors: Luca Dede' (EPFL, Switzerland); Davide Forti (EPFL, Switzerland); Alfio Quarteroni (EPFL, Switzerland)

In this talk, we focus on the coupling of electrophysiology and mechanical models to realize an integrated model of the left ventricle by considering the active contraction of the muscle and the feedback on the electrophysiology. For the latter, we consider the mono-domain equations with the Bueno-Orovio ionic model. As for the mechanics, we consider the Holzapfl-Ogden model together with an active strain approach with a transmurally variable activation parameter. We spatially approximate the model by means of the Finite Element method and discuss the properties of different coupling strategies and time discretization schemes. Among these, we consider a fully coupled strategy with a semi-implicit scheme for the time discretization. In order to solve the linear system arising from such discretization, we use a preconditioner based on the FaCSI (Factorized Condensed SIMPLE) concept. We present and discuss numerical results obtained in the HPC framework, including patient-specific left ventricle geometries.
Gerya Taras MS Presentation
Friday, June 10, 2016
Garden 1A, 10:00-10:15

MS Presentation

From Tectonic to Seismic Timescales in 3D Continuum Models, Taras Gerya (Institute of Geophysics, ETH Zürich, Switzerland)

Co-Authors: Ylona van Dinther (ETH Zurich, Switzerland); Laetitia Le Pourhiet (Pierre and Marie Curie University, France); Dave A. May (ETH Zurich, Switzerland); Taras Gerya (ETH Zurich, Switzerland)

Lateral rupture limits substantially regulate the magnitude of great subduction megathrust earthquakes, but in turn, factors controlling it remain largely unknown due to the limited spatio-temporal range of observations. It however involves the long-term, regional tectonic history, including structural, stress and strength heterogeneities. This problem requires a powerful 3D-continuum numerical modelling approach that bridges tectonic and seismic timescales, but a suitable code is lacking. We demonstrate the development of a scalable PETSc-based staggered-grid finite difference code, in which self-consistent long-term deformation and spontaneous rupture are ensured through a solid-mechanics based visco-elasto-plastic rheology with a slip rate-dependent friction formulation, an energy-conservative inertial implementation, artificial damping of seismic waves at the domain boundaries, and an adaptive, implicit-explicit time-stepping scheme. Automated discretization and manufactured solution benchmarks ensure stability, flexibility and accuracy of the code at every stage of development.
Gewaltig Marc-Oliver MS Presentation
Friday, June 10, 2016
Garden 2BC, 10:20-10:40

MS Presentation

From Data to Models: A Semi-Automatic Workflow for Large-Scale Brain Models, Marc-Oliver Gewaltig (EPFL, Switzerland)

Co-Authors:

We present a semi-automatic process for constructing whole-brain models at the point neuron level from different sets of image stacks. In the first step, we determine the positions of all cells, using high-resolution Nissl stained microscope image. In the second step, we determine the type of each cell (glia, excitatory neurons and inhibitory neurons), using in situ hybridization (ISH) image data and compare the resulting cell and neuron numbers to literature data. In the next step, we use two-photon tomography images of rAAV labeled axonal projections to determine the mesoscale connectivity between the neurons in different brain regions. Finally, we obtain a network model that can be simulated with state-of the art simulators like NEST and NEURON.
Ghasemi Alireza MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 17:00-17:30

MS Presentation

Force Fields Based on a Neural Network Steered Charge Equilibration Scheme, Alireza Ghasemi (Institute for Advanced Studies in Basic Sciences,, Iran)

Co-Authors: Alireza Ghasemi (Institute for Advanced Studies in Basic Sciences, Iran)

Similarly to density functional calculations with a small basis set, charge equilibration schemes allow to determine charge densities which are constrained by the functional form chosen for representing this charge density. The flow of the charge is determined by the electrostatic interactions and the local electronegativity of all the atoms. By introducing an atomic environment dependent electronegativity, which is predicted by a neural network, we can reach density functional accuracy at a small fraction of the numerical cost of a full density functional calculation for ionic materials. Extension to other materials will also be discussed.
Ghattas Omar MS Presentation
Friday, June 10, 2016
Garden 1A, 09:00-09:30

MS Presentation

HPC Challenges Arising in Forward and Inverse Mantle Flow Simulation, Omar Ghattas (The University of Texas at Austin, United States of America)

Co-Authors: Johann Rudi (University of Texas at Austin, United States of America); Vishagan Ratnaswamy (California Institute of Technology, United States of America); Dunzhu Li (California Institute of Technology, United States of America); Tobin Isaac (University of Chicago, United States of America); Michael Gurnis (California Institute of Technology, United States of America); Omar Ghattas (University of Texas at Austin, United States of America)

We discuss scalable solvers for the forward and inverse simulation of mantle flow problems. Crucial solver components for the arising nonlinear Stokes problems are parallel multigrid methods for preconditioning the linearized Stokes system, and a Schur complement approximation that is able to cope with extreme viscosity variations. To achieve good parallel scalability, we use, among others, matrix-free operations and we redistribute coarse multigrid levels to a subsets of all available processors. We will discuss the inversion of global rheology parameters and distributed fields from surface data and the present-day temperature distribution in instantaneous and time-dependent problems.
Gheller Claudio Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Claudio Gheller (Swiss National Supercomputing Centre, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Claudio Gheller (Swiss National Supercomputing Centre, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.
Poster

Poster

CSM-09 Hash Tables on GPUs Using Lock-Free Linked Lists, Claudio Gheller (Swiss National Supercomputing Centre, Switzerland)

Co-Authors: Andreas Bleuler (University of Zurich, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Romain Teyssier (University of Zurich, Switzerland)

Hash table implementations which resolve collisions by chaining with linked lists are very flexible with respect to the insertion of additional keys to an existing table and to the deletion of a part of the keys from it. For our implementation on GPUs, we use non-blocking linked lists based on atomic "compare and swap" operations. The deletion of list entries is done by declaring them as invalid and removing them. Typically, after a couple of deletion operations, our local heap is compacted. Using this approach, the initial build of the hash table and hash lookups perform comparably to the CUDPP library implementation. However, small modifications of the table are performed much faster in our implementation than the complete rebuild required by other implementations. We intend to use this novel hash table implementation for astrophysical GPU simulations with adaptive mesh particle-in-cell, which would benefit greatly from these new features.
Giantomassi Matteo MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 13:40-14:00

MS Presentation

Automating and Optimising ABINIT Calculations on HPC Architectures: Challenges and Possible Solutions., Matteo Giantomassi (Université catholique de Louvain, Belgium)

Co-Authors:

With HPC advances of recent years, ab initio techniques are becoming the standard tool for studying and predicting many materials properties. To be effective, it is necessary to have software particularly well suited to the hardware as well as a strong interaction with the ab initio application to optimise the amount of data produced and the execution time. In this talk, I will discuss the recent developments done in ABINIT in order to pave the way towards exascale computation and the python framework we are developing to automate and optimise large workflows on HPC architectures.
MS Summary

MS Summary

MS04 First-Principles Simulations on Modern and Novel Architectures, Matteo Giantomassi (Université catholique de Louvain, Belgium)

Co-Authors: Matteo Giantomassi (Université catholique de Louvain, Belgium)

The predictive power of the so-called ab initio methods, based on the fundamental quantum-mechanical models of matter at the atomic level, together with the growing computational power of high-end High Performance Computing (HPC) systems, have led to exciting scientific and technological results in Materials Science. The increase of computational power coupled with better numerical techniques open up the possibility to simulate and predict the behaviour of larger and larger atomic systems with a higher degree of accuracy, shortening the path from theoretical results to technological applications, and opening up the possibility to design new materials from scratch. Despite the elegant simplicity of the formulation of the basic quantum mechanical principles, a practical implementation of a many-particle simulation has to use some approximations and models to be feasible. As there are several options for these approximations, different ab initio simulation codes have been developed, with different trade-offs between precision and computational effort. Each of these codes has its specific strengths and weaknesses, but all together have contributed to making computational materials science one of the domains where supercomputers raise the efficiency of producing scientific know-how and technological innovation. Indeed, a large fraction of the available workload in supercomputers around the world is spent to perform Computational Materials Science simulations. These codes have mostly kept pace with hardware improvements over the years, by relying on proven libraries and paradigms, such as MPI, that could abstract the developers from low-level considerations while the architectures evolved within a nearly homogeneous model. In the past few years, however, the emergence of heterogeneous computing elements associated with the transition from peta- to exascale has started to evidence the fragility of this model of development. The aim of the present minisymposium is to gather expert developers from different codes to discuss the challenges of porting, scaling, and optimizing material science application codes for modern and novel platforms. The presentations will focus on advanced programming paradigms, novel algorithms, domain specific libraries, in-memory data management, and software/hardware co-design. Exascale related challenges (such as sustained performance, energy awareness, code fault tolerance, task concurrency and load balancing, numerical noise and stability, big data I/O) will also be discussed.
Gibertini Marco MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:00-15:20

MS Presentation

High-Throughput Prediction of Novel Two-Dimensional Materials, Marco Gibertini (EPFL, Switzerland)

Co-Authors: Philippe Schwaller (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Ivano E. Castelli (EPFL, Switzerland); Marco Gibertini (EPFL, Switzerland); Giovanni Pizzi (EPFL, Switzerland); Nicola Marzari (EPFL, Switzerland)

As a crucial step towards the identification of novel and promising 2D materials, we provide here a large scale first-principles exploration and characterization of such compounds. From a combination of 480,000 structures harvested from the ICSD and COD databases, three-dimensional crystals are screened systematically by checking the absence of chemical bonds between adjacent layers, identifying more than 6,000 layered systems. Then DFT calculations of the van der Waals interlayer bonding are performed with automatic workflows, while systematically assessing the metallic, insulating or magnetic character of the materials obtained. Following full atomic and cell relaxations, phonon dispersions are computed as a first step towards the assessment of thermodynamic properties. Thanks to the AiiDA materials' informatics platform [1], and in particular its automatic workflow engine, database structure, sharing capabilities, and pipelines to/from crystallographic repositories, the systematic and reproducible calculation of these properties becomes straightforward, together with seamless accessibility and sharing. [1] http://aiida.net
Giesecke Andre MS Presentation
Thursday, June 9, 2016
Garden 1A, 11:00-11:30

MS Presentation

Numerical Simulations of Precession Driven Flows and their Ability to Drive a Dynamo, Andre Giesecke (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

Co-Authors:

In a next generation dynamo experiment currently under development at Helmholtz-Zentrum Dresden-Rossendorf (HZDR) a fluid flow of liquid sodium, solely driven by precession, will be considered as a possible source for magnetic field generation. In my talk I will present results from hydrodynamic simulations of a precession driven flow in cylindrical geometry. In a second step, the velocity fields obtained from the hydrodynamic simulations have been applied to a kinematic solver for the magnetic induction equation in order to determine whether a precession driven flow will be capable to drive a dynamo at experimental conditions. It turns out that excitation of dynamo action in a precessing cylinder at moderate precession rates is difficult and future dynamo simulations are required in more extreme parameter regimes where a more complex fluid flow is observed in water experiments which is supposed to be beneficial for dynamo action.
Gimenez Judit Paper
Thursday, June 9, 2016
Auditorium C, 11:00-11:30

Paper

Benefits of SMT and of Parallel Transpose Algorithm for the Large-Scale GYSELA Application, Judit Gimenez (BSC, Spain)

Co-Authors: Julien Bigot (CEA, France); Nicolas Bouzat (INRIA, France); Judit Gimenez (BSC, Spain); Virginie Grandgirard (CEA, France)

This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The semi-Lagrangian code Gysela typically performs large physics simulations using a few thousands of cores, between 1k cores up to 16k on x86-based clusters. However, simulations with finer resolutions and with kinetic electrons increase those needs by a huge factor, providing a good example of applications requiring Exascale machines. To improve Gysela compute times, we take advantage of efficient SMT implementations available on recent INTEL architectures. We also analyze the cost of a transposition communication scheme that involves a large number of cores in our case. Adaptation of the code for balance load whenever using both SMT and good deployment strategy led to a significant reduction that can be up to 38% of the execution times.
Glöss Andreas Poster

Poster

MAT-04 CP2K within the PASC Materials Network, Andreas Glöss (Department of Chemistry, University of Zurich, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Hans Pabst (Intel Semiconductor AG, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

One of the goals of the PASC project is to strengthen the networking in the Swiss material science community through active development of collaborative relationships among University researchers and CSCS staff. This includes assisting researchers in tuning, debugging, optimizing, and enhancing codes and applications for HPC resources, from mid-scale to national and international petascale facilities, with a view to the exascale transition. In addition, the application support specialists provide support for development projects on software porting techniques, parallelization and optimization strategies, deployment on diverse computational platforms, and data management. Here we present selected tools and software developed for CP2K [1]. Furthermore we show exemplary how a CP2K application can be tuned to optimally use all available HPC resources. With a view to the next-generation HPC hardware, we present first promising performance results for INTEL's Broadwell-EP and KNL platform. [1] The CP2K developers group, CP2K is freely available from: https://www.cp2k.org/, 2016
Poster

Poster

MAT-09 Sparse Matrix Multiplication Library for Linear Scaling DFT Calculations in Electronic Structure Codes, Andreas Glöss (Department of Chemistry, University of Zurich, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Andreas Glöss (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

The key operation for linear scaling DFT implemented in the CP2K quantum chemistry program is sparse matrix-matrix multiplication. For such a task, the sparse matrix library DBCSR (Distributed Block Compressed Sparse Row) has been developed. DBCSR takes full advantage of the block-structured sparse nature of the matrices for efficient computation and communication. It is MPI and OpenMP parallelized, and can exploit accelerators. We describe a strategy to improve DBCSR performance. DBCSR is available as a stand alone library at http://dbcsr.cp2k.org/ to be employed in electronic structure codes. To this end a streamlined API has been defined and a suite of tools has been developed to generate the full documentation of the library (API-DOC) by extracting the information provided directly in the source code. We give a flavour of the generated API-DOC by showing snapshots of selected HTML documentation pages and we sketch the design of such tools.
Goedecker Stefan MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 17:00-17:30

MS Presentation

Force Fields Based on a Neural Network Steered Charge Equilibration Scheme, Stefan Goedecker (Uni Basel, Switzerland)

Co-Authors: Alireza Ghasemi (Institute for Advanced Studies in Basic Sciences, Iran)

Similarly to density functional calculations with a small basis set, charge equilibration schemes allow to determine charge densities which are constrained by the functional form chosen for representing this charge density. The flow of the charge is determined by the electrostatic interactions and the local electronegativity of all the atoms. By introducing an atomic environment dependent electronegativity, which is predicted by a neural network, we can reach density functional accuracy at a small fraction of the numerical cost of a full density functional calculation for ionic materials. Extension to other materials will also be discussed.
MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 14:20-14:40

MS Presentation

BigDFT: Flexible DFT Approach to Large Systems Using Adaptive and Localized Basis Functions, Stefan Goedecker (Uni Basel, Switzerland)

Co-Authors: Luigi Genovese (CEA/INAC, France); Stefan Mohr (BSC, Spain); Laura Ratcliff (Argonne National Laboratory, United States of America); Stefan Goedecker (University of Basel, Switzerland)

Since 2008, the BigDFT project consortium has developed an ab initio DFT code based on Daubechies wavelets. In recent articles, we presented the linear scaling version of BigDFT code[1], where a minimal set of localized support functions is optimised in situ for systems in various boundary conditions. We will present how the flexibility of this approach is helpful in providing a basis set that is optimally tuned to the chemical environment surrounding each atom. In addition than providing a basis useful to project Kohn-Sham orbitals informations like atomic charges and partial density of states, it can also be reused as-is, without re-optimisation, for charge-constrained DFT calculations within a fragment approach[2]. We will demonstrate the interest of this approach to express highly precise and efficient calculations of systems in complex environments[3]. [1] JCP 140, 204110 (2014), PCCP 17, 31360 (2015) [2] JCP 142, 23, 234105 (2015) [3] JCTC 11, 2077 (2015)
Poster

Poster

PHY-06 Soft Norm Conserving Accurate Pseudopotentials, Stefan Goedecker (Uni Basel, Switzerland)

Co-Authors: Stefan Goedecker (University of Basel, Switzerland)

Soft and accurate pseudopotentials are necessary for prediction of new materials. With the inclusion of non-linear-core-correction (NLCC) along with semi-core states to the Goedecker pseudopotential[1-2] we generated soft and accurate pseudopotentials for Perdew, Burke, Ernzerhof (PBE)[3] functional for H to Ar and few transition-metals. Through the Delta-Test[4,5], it is found that they are good for bulk systems where the average delta value is 0.15 meV/atom. The average error in atomization-energy of G2-1 Test Set is 1.32 Kcal/mol obtained using NWChem with aug-cc-pV5Z basis set. [1] C. Hartwigsen, S. Goedecker, and J. Hutter, Phys. Rev. B 58, 3641 (1998) [2] A. Williand et al. , J. Chem. Phys. 138, 104109 (2013) [3] P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77 , 3865 (1996) [4] K. Lejaeghere et al. Critical Reviews in Solid State and Materials Sciences 39, 1-24 (2014) [5] K. Lejaeghere et al. Science (351), 6280 (2016)
Poster

Poster

MAT-03 Complex Wet-Environments in Electronic-Structure Calculations, Stefan Goedecker (Uni Basel, Switzerland)

Co-Authors: Luigi Genovese (CEA/INAC, France); Oliviero Andreussi (Università della Svizzera italiana, Switzerland); Nicola Marzari (EPFL, Switzerland); Stefan Goedecker (University of Basel, Switzerland)

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions at the ab-initio level in the presence of the proper electrochemical environment. In this work we present a continuum solvation library able to handle both neutral and ionic solutions, solving the Generalized Poisson and the Poisson-Boltzmann problem. Two different recipes have been implemented to build up the continuum dielectric cavity (one using atomic coordinates, the other mapping the solute electronic density). A preconditioned conjugate gradient method has been implemented for the Generalized Poisson equation, whilst a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. Both solvers and continuum dielectric cavities have been integrated into the BigDFT electronic-structure package. We benchmarked the whole library on several atomistic systems including small neutral molecules, large proteins, solvated surfaces and reactions in solution to demonstrate efficiency and performances.
Gokhberg Alexey Paper
Wednesday, June 8, 2016
Auditorium C, 13:00-13:30

Paper

Automatic Global Multiscale Seismic Inversion: Insights into Model, Data, and Workflow Management, Alexey Gokhberg (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Alexey Gokhberg (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Modern global seismic waveform tomography is formulated as a PDE-constrained nonlinear optimization problem, where the optimization variables are Earth's visco-elastic parameters. This particular problem has several defining characteristics. First, the solution to the forward problem, which involves the numerical solution of the elastic wave equation over continental to global scales, is computationally expensive. Second, the determinedness of the inverse problem varies dramatically as a function of data coverage. This is chiefly due to the uneven distribution of earthquake sources and seismometers, which in turn results in an uneven sampling of the parameter space. Third, the seismic wavefield depends nonlinearly on the Earth's structure. Sections of a seismogram which are close in time may be sensitive to structure greatly separated in space.

In addition to these theoretical difficulties, the seismic imaging community faces additional issues which are common across HPC applications. These include the storage of massive checkpoint files, the recovery from generic system failures, and the management of complex workflows, among others. While the community has access to solvers which can harness modern heterogeneous computing architectures, the computational bottleneck has fallen to these memory- and manpower-bounded issues.

We present a two-tiered solution to the above problems. To deal with the problems relating to computational expense, data coverage, and the increasing nonlinearity of waveform tomography with scale, we present the Collaborative Seismic Earth Model (CSEM). This model, and its associated framework, takes an open-source approach to global-scale seismic inversion. Instead of attempting to monolithically invert all available seismic data, the CSEM approach focuses on the inversion of specific geographic subregions, and then consistently integrates these subregions via a common computational framework. To deal with the workflow and storage issues, we present a suite of workflow management software, along with a custom designed optimization and data compression library. It is the goal of this paper to synthesize these above concepts, originally developed in isolation, into components of an automatic global-scale seismic inversion.
Golze Dorothea MS Presentation
Friday, June 10, 2016
Garden 1BC, 10:45-11:00

MS Presentation

Local Density Fitting within a Gaussian and Plane Waves Approach: Accelerating Simulations Based on Density Functional Theory, Dorothea Golze (University of Zurich, Switzerland)

Co-Authors:

A local resolution-of-identity (LRI) is introduced for Kohn-Sham (KS) density functional theory calculations using a mixed Gaussian and plane waves (GPW) approach within the CP2K program package. The locality of the density fitting ensures that the linear scaling of the GPW approach is retained, while the prefactor for calculating the KS matrix is drastically reduced. In LRIGPW, the atomic pair densities are approximated by an expansion in one-center fit functions. Thereby, the computational demands for the grid-based operations become negligible, while they are dominant for GPW. The LRI approach is assessed for a wide range of molecular and periodic systems yielding highly accurate results for reaction energies as well as intra- and intermolecular structure parameters. Employing LRI, the SCF step is sped up by a factor of 2-25 depending on the symmetry of the simulation cell, the grid cutoff and the system size.
Gombosi Tamas MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:00-16:30

MS Presentation

Decoupling and Coupling in iPIC3D, a Particle-in-Cell Code for Exascale, Tamas Gombosi (University of Michigan, United States of America)

Co-Authors: Stefano Markidis (Royal Institute of Technology, Sweden); Erwin Laure (Royal Institute of Technology, Sweden); Yuxi Chen (University of Michigan, United States of America); Gabor Toth (University of Michigan, United States of America); Tamas Gombosi (University of Michigan, United States of America)

iPIC3D is a massively parallel three-dimensional implicit particle-in-cell code used for the study of the interactions between the solar wind and Earth's magnetosphere. Because the scalability potential and scientific impact demonstrated by iPIC3D, it has been selected in many European HPC projects to prepare for the future exascale machines. In this talk, we present new algorithmic changes to iPIC3D in preparation for the coming exascale era. In particular, we will present decoupled computation, communication and I/O operations in iPIC3D to address the challenges of irregular operations on large number of processes. Our evaluation results show that the performance benefits from this model increases as the scale of simulation increases. We also present a two-way coupled kinetic-fluid model with multiple implicit PIC domains (by the iPIC3D code) embedded in MHD (by the BATS-R-US code) under the Space Weather Modeling Framework (SWMF).
Gonnet Pedro Paper
Wednesday, June 8, 2016
Auditorium C, 13:30-14:00

Paper

SWIFT: Using Task-Based Parallelism, Fully Asynchronous Communication, and Graph Partition-Based Domain Decomposition for Strong Scaling on more than 100 000 Cores, Pedro Gonnet (Durham University, United Kingdom)

Co-Authors: Pedro Gonnet (Durham University, United Kingdom); Aidan B. G. Chalk (Durham University, United Kingdom); Peter Draper (Durham University, United Kingdom)

We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared / distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: (i) Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores; (ii) Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally distributed across all nodes; (iii) Fully dynamic and asynchronous communication, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferring on tasks that rely on data from other nodes until it arrives.

In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.
Gorlani Paolo Poster

Poster

CSM-03 A Proposal for the Implementation of Discontinuous Galerkin Methods for Elliptic Problems on GPUs, Paolo Gorlani (Politecnico di Milano, Italy)

Co-Authors:

This work introduces a code which solves a diffusion-reaction problem using a Discontinuous Galerkin method. We have designed this code in order to exploit the full potential of GPU architectures. The sparse matrix-vector product, fundamental in iterative solvers, is a critical point for GPUs. The locality of Discontinuous Galerkin methods let us implement the stiffness matrix-vector product as a stencil code. The "actions" of local operators are applied directly to the vector of the global degrees of freedom. The implementation is matrix-free; only the vectors of the degrees of freedom and the mesh data are stored in the GPU memory. Our stiffness matrix-vector product implementation has shown good scalability on CSCS's Piz Daint system. The code has been developed as a master thesis project by Paolo Gorlani. The thesis supervisors are Prof.ssa P. F. Antonietti and Prof. L. Bonaventura, both from MOX.
Gourinovitch Anna Poster

Poster

CSM-05 CloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud, Anna Gourinovitch (Dublin City University, Germany)

Co-Authors: Anna Gourinovitch (Dublin City University, Ireland)

CloudLightning is funded under the European Union's Horizon 2020 research and innovation programme under the call H2020-ICT-2014-1. It comprises eight partners from academia and industry and is coordinated by University College Cork. The objective of the project is to create a new way of provisioning heterogeneous cloud resources to deliver cloud services on the principles of self-management and self-organisation. This new self-organising system will make the cloud more accessible to cloud consumers and provide cloud service providers with power-efficient, scalable management of their cloud infrastructures. The CloudLightning solution will be demonstrated in three application domains: i) Genome Processing; ii) Oil and Gas exploration; and iii) Ray Tracing. Expected impacts for European cloud service providers include increased competitiveness through reduced cost and differentiation; increased energy efficiency and reduced environmental impact; improved service delivery; and greater accessibility to cloud computing for high performance computing workloads.
Grandgirard Virginie Paper
Thursday, June 9, 2016
Auditorium C, 11:00-11:30

Paper

Benefits of SMT and of Parallel Transpose Algorithm for the Large-Scale GYSELA Application, Virginie Grandgirard (CEA, France)

Co-Authors: Julien Bigot (CEA, France); Nicolas Bouzat (INRIA, France); Judit Gimenez (BSC, Spain); Virginie Grandgirard (CEA, France)

This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The semi-Lagrangian code Gysela typically performs large physics simulations using a few thousands of cores, between 1k cores up to 16k on x86-based clusters. However, simulations with finer resolutions and with kinetic electrons increase those needs by a huge factor, providing a good example of applications requiring Exascale machines. To improve Gysela compute times, we take advantage of efficient SMT implementations available on recent INTEL architectures. We also analyze the cost of a transposition communication scheme that involves a large number of cores in our case. Adaptation of the code for balance load whenever using both SMT and good deployment strategy led to a significant reduction that can be up to 38% of the execution times.
Gratton Serge MS Presentation
Wednesday, June 8, 2016
Garden 3B, 17:00-17:30

MS Presentation

Numerical Solution of the Time-Parallelized Weak-Constraint 4DVAR, Serge Gratton (IRIT/CERFACS, France)

Co-Authors: Serge Gratton (Institut de Recherche en Informatique de Toulouse / CERFACS, France); Mike Fisher (ECMWF, United Kingdom)

This study will address the numerical solution of the saddle point system arising from four dimensional variational (4D-Var) data assimilation, including a study of preconditioning and its convergence properties. This saddle point formulation of 4D-Var allows parallelization in time dimension. Therefore, it represents a crucial step towards higher computational efficiency, since 4D-Var approaches otherwise require many sequential computations. In recent years, there has been increasing interest in saddle point problems which arise in many other applications such as constrained optimisation, computational fluid dynamics, optimal control and so forth. The key issue of solving saddle point systems with Krylov subspace methods is to find efficient preconditioners. This work focuses on the preconditioners obtained by using limited memory low-rank updates and presents numerical results obtained from the Object Oriented Prediction System (OOPS) developed by ECMWF.
Graves Jonathan Contributed Talk
Thursday, June 9, 2016
Garden 3A, 11:10-11:30

Contributed Talk

Self-Consistent Modelling of Plasma Heating and Fast Ion Generation Using Ion-Cyclotron Range of Frequency Waves in 2D and 3D Devices, Jonathan Graves (EPFL, Switzerland)

Co-Authors: Wilfred Cooper (EPFL, Switzerland); Jonathan Graves (EPFL, Switzerland); David Pfefferlé (EPFL, Switzerland); Joachim Geiger (Max Planck Institute of Plasma Physics, Germany)

Ion-Cyclotron Range of Frequency (ICRF) waves is an efficient source of plasma heating in tokamaks and stellarators. In ICRF heated plasmas, the resonating particles phase-space distribution function displays significant distortion. A significant consequence is to modify noticeably the plasma properties which dictates the propagation of the ICRF wave. The self-consistent modelling tool SCENIC was built in order to solve this highly non-linear problem. It is one of the few ICRF modelling tools able to tackle both 2D and 3D plasma configurations. The computational resources, in particular the amount of shared memory required to resolve the plasma equilibrium and the wave propagation, significantly increases for simulations of strongly 3D equilibrium such as stellarators compared to 2D tokamaks calculation. We present some applications of SCENIC to tokamak and stellarator plasmas. Particular focus is given to simulations of the recenlty started Wendelstein7-X stellarator experiment which will use ICRF waves for fast particle generation.
Greer Julia MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:30-15:00

MS Presentation

Deformation and Failure Behavior of Metallic Nanostructures, Julia Greer (California Institute of Technology, United States of America)

Co-Authors: Zhaoxuan Wu (Institute of High Performance Computing, A*STAR, Singapore); Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore); Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore); Wendy Gu (California Institute of Technology, United States of America); Julia Greer (California Institute of Technology, United States of America); David Srolovitz (University of Pennsylvania, United States of America)

The reliability of metallic nanostructures, such as, nanowires and thin films, is often dictated by their mechanical failures. An in-depth understanding in the effects of their intrinsic factors such as grain boundaries and surface roughness and extrinsic factors such as size, shapes and man-made notches on their plastic deformation mechanisms and failure patterns is of great importance in fabricating these nanostructures with high reliability. In this talk, we will first briefly review various deformation mechanisms and failure patterns in literature and highlight some of critical issues that currently are under active research. We will then report our recent progresses in the study of the effects of intrinsic factors, such as grain boundaries and dislocations and extrinsic factors such as sizes, shapes and man-made notches, on the plasticity and failure of metallic nanostructures using both mechanical testing and large-scale molecular dynamics simulations.
Grimm Simon MS Presentation
Thursday, June 9, 2016
Garden 3C, 15:05-15:25

MS Presentation

GENGA: A GPU N-Body Code for Terrestrial Planet Formation, Simon Grimm (University of Zürich, Switzerland)

Co-Authors:

GENGA is a GPU N-body code, designed and optimised to simulate the process of terrestrial planet formation and long term evolution of (exo-) planetary systems. The use of the parallel computing power of GPUs allows GENGA to achieve a significant speedup compared to other N-body codes. The entire simulation is performed on the GPU, to avoid slow memory transfer. GENGA uses a hybrid symplectic integration scheme, which allows a very good energy conservation level even in the presence of frequent close encounters and collisions between individual bodies. The code can be used with three different computational modes: The main mode can integrate up to 32,768 fully gravitational interacting planetesimals, the test particle mode can integrate up to 1 million massless bodies in the presence of some massive planets or protoplanets. And the third mode allows the parallel integration of up to 100,000 samples of small exoplanetary systems with different parameters.
Grohs Philipp Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:50-12:10

Contributed Talk

Tensor-Product Discretization for the Spatially Inhomogeneous and Transient Boltzmann Equation, Philipp Grohs (SAM ETHZ, Switzerland)

Co-Authors: Philipp Grohs (ETH Zurich, Switzerland); Ralf Hiptmair (ETH Zurich, Switzerland)

The Boltzmann equation provides a fundamental mesoscopic model for the dynamics of rarefied gases. The computational challenge arising from it's discretization is twofold: we face a moderately high-dimensional problem and the collision operator is non-linear and non-local in the velocity variable. We aim for a deterministic and asymptotically exact Galerkin discretization. This sets our approach apart from stochastic Monte-Carlo-type and Fourier based methods. We consider a tensor product discretization of the distribution function combining Laguerre polynomials times a Maxwellian in velocity with continuous first-order finite elements in space. Unlike the Fourier spectral methods, our approach does not require truncation of the velocity domain and it does not suffer from aliasing errors. The advection problem is discretized through a Galerkin least-squares technique and yields an implicit formulation in time. Numerical results of benchmark simulations in 2+2 dimensions will be presented.
Grosser Tobias Poster

Poster

CSM-10 Polly-ACC Transparent Compilation to Heterogeneous Hardware, Tobias Grosser (ETH Zurich, Switzerland)

Co-Authors: Torsten Hoefler (ETH Zurich, Switzerland)

Sequential programs compiled for today's heterogeneous hardware often exploit only a small fraction of available compute resources. To benefit of GPU accelerators the use of explicit parallel programming languages, pragma annotation systems, or specialized code generators is commonly necessary. We address the problem of automatically generating GPU code by developing a newly integrated heterogeneous compute compiler which - using latest polyhedral modeling techniques - automatically maps sequential programs to accelerators. For a range of applications we observe almost no performance regressions. On top of this baseline, we report performance improvements for a multiple computer kernels as well as two application benchmarks from SPEC CPU 2006.
Grund Alexander MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Grünewald Daniel MS Presentation
Wednesday, June 8, 2016
Garden 2A, 17:00-17:30

MS Presentation

GASPI: Bringing FDTD Simulations to Extreme Scale, Daniel Grünewald (Fraunhofer ITWM, Germany)

Co-Authors:

We present the Asynchronous Constraint Execution framework (ACE) which has been originally developed for a fully scalable single shot computation of Reverse Time Migration (RTM), the method of first choice in seismic imaging. The basic ingredients of ACE is a fine granular domain decomposition supplemented by an efficient data dependency driven task scheduling on the underlying, possibly heterogeneous compute resources. As such, it provides combined data and task parallelism. This is complemented on the interprocess level by the one-sided communication primitives of GASPI equipped with lightweight remote completion checks. This perfectly fits into the concept of data dependency driven execution and allows for perfect overlap of communication by computation. A contiguous stream of computational tasks to the underlying processing units is guaranteed. The achieved scalability with GPI2.0, the GASPI reference implementation, is almost perfect over three orders of magnitude up to 1,536 nodes (43,008 cores) as tested at LRZ SuperMUC.
Gu Wendy MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:30-15:00

MS Presentation

Deformation and Failure Behavior of Metallic Nanostructures, Wendy Gu (California Institute of Technology, United States of America)

Co-Authors: Zhaoxuan Wu (Institute of High Performance Computing, A*STAR, Singapore); Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore); Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore); Wendy Gu (California Institute of Technology, United States of America); Julia Greer (California Institute of Technology, United States of America); David Srolovitz (University of Pennsylvania, United States of America)

The reliability of metallic nanostructures, such as, nanowires and thin films, is often dictated by their mechanical failures. An in-depth understanding in the effects of their intrinsic factors such as grain boundaries and surface roughness and extrinsic factors such as size, shapes and man-made notches on their plastic deformation mechanisms and failure patterns is of great importance in fabricating these nanostructures with high reliability. In this talk, we will first briefly review various deformation mechanisms and failure patterns in literature and highlight some of critical issues that currently are under active research. We will then report our recent progresses in the study of the effects of intrinsic factors, such as grain boundaries and dislocations and extrinsic factors such as sizes, shapes and man-made notches, on the plasticity and failure of metallic nanostructures using both mechanical testing and large-scale molecular dynamics simulations.
Guerciotti Bruno MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:45-17:00

MS Presentation

Computational Study of the Risk of Restenosis in Coronary Bypasses, Bruno Guerciotti (Politecnico di Milano, Italy)

Co-Authors: Christian Vergara (Politecnico di Milano, Italy); Sonia Ippolito (Ospedale Luigi Sacco Milano, Italy); Roberto Scrofani (Ospedale Luigi Sacco Milano, Italy); Alfio Quarteroni (EPFL, Switzerland)

Coronary artery disease, caused by the build-up of atherosclerotic plaques in coronary vessel walls, is one of the leading causes of death in the world. For high-risk patients, coronary artery bypass graft is the preferred treatment. Despite overall excellent patency rates, bypasses may fail due to restenosis. In this context, we present a computational study of the fluid-dynamics in patient-specific geometries with the aim of investigating a possible relationship between coronary stenosis and graft failure. Firstly, we propose a strategy to prescribe realistic boundary conditions in absence of measured data, based on an extension of Murray's law to provide the flow division at bifurcations in case of stenotic vessels and non-Newtonian blood rheology. Then, we show some results regarding numerical simulations in patients treated with grafts, in which the degree of coronary stenosis is virtually varied to compare the fluid-dynamics in terms of hemodynamic indices potentially involved in restenosis development.
Guervilly Celine MS Presentation
Thursday, June 9, 2016
Garden 1A, 11:30-12:00

MS Presentation

Subcritical Convection in a Rotating Sphere Using an Hybrid 2D/3D Model, Celine Guervilly (Newcastle University, United Kingdom)

Co-Authors:

We study thermal convection driven by internal heating in a rotating sphere for low Prandtl numbers. Our model assumes that the velocity is invariant along the axis of rotation due to the rapid rotation of the system, while the temperature is computed in 3D. We find the presence of two dynamical branches: a weak branch that is connected to the linear onset of convection and a strong branch with large values of the convective and zonal velocities.
Guibert David Poster

Poster

CSM-11 Porting SPH-Flow to GPUs Using OpenACC: Experience and Challenges, David Guibert (Nextflow Software, France)

Co-Authors: Guillaume Jeusel (Nextflow Software, France); Jean-Guillaume Piccinali (ETH Zurich / CSCS, Switzerland); Guillaume Oger (École centrale de Nantes, France)

SPH-flow is the one of the most advanced SPH solvers dedicated to high dynamic multiphase physics simulations. Over the last year, ECNantes has partnered with Nextflow-Software to deliver an accelerated version of the code on Piz Daint. This poster will present the results of this development activity. After assessing the overall performance of the code, we focused on the Monaghan solver. We investigated strategies to improve its performance for an efficient execution on CPUs as well as GPUs, maintaining the scalability of the MPI version and high programmability. The keys to our incremental successes were being able to run a reduced version of the code, data types refactoring and workarounds for the limitations of the compilers. This work should be of interest to academic developers because it details our experience using OpenACC directives for scientific computing in an area of cutting edge research.
MS Presentation
Thursday, June 9, 2016
Garden 2A, 12:00-12:15

MS Presentation

HPC Simulations of Complex Free-Surface Flow Problems with SPH-Flow Software, David Guibert (Nextflow Software, France)

Co-Authors: Matthieu De Leffe (Nextflow Software, France); David Guibert (Nextflow Software, France); Pierre Bigay (Nextflow Software, France)

SPH-Flow is a multi-purpose, multi-physics CFD software based on the SPH method (smoothed particle hydrodynamics). It is developed by Ecole Centrale Nantes and Nextflow Software. The solver was first developed for fluid flow simulations dedicated to complex non-linear free surface problems and has then been extended to multi-fluid, multi-structure, fluid/structure interaction and viscous flows. It relies on state of the art meshless algorithms. SPH-Flow solver is parallelized with a distributed memory parallelization relying on a domain decomposition. Today, the applicative computations usually performed involve 64 to 4000 processors, depending on the problem. New industrial problems can now be solved with this method and its efficient HPC implementation. This talk will address the description of the SPH-Flow solver and parallelization. Massively parallel complex innovative simulations will then be discussed: Tire aquaplaning simulations, Wave impacts on a ship, Fluid flow in a car gear box and River-crossing simulations of a car.
Gunawan Rudiyanto MS Presentation
Thursday, June 9, 2016
Garden 3A, 14:00-14:30

MS Presentation

Coping with Underdetermined Biological Network Inference, Rudiyanto Gunawan (ETH Zurich, Switzerland)

Co-Authors:

Many problems of great importance in systems biology involve inferring causal relationships among cellular components from high-throughput biological data. Such inferences typically face significant challenges due to the curse of dimensionality. This issue translates to solving an underdetermined inverse problem, for which there could exist multiple solutions. In this talk, I will present an ensemble inference strategy for addressing underdetermined causal inference particularly when using gene transcriptional expression data for inferring gene regulatory networks. In particular, I will introduce an inference algorithm called TRaCE (Transitive Reduction and Closure Ensemble) for identifying the ensemble of network graphs from expression data of gene knock-out experiments. Furthermore, I will describe an ensemble-based optimisation of gene knock-outs, called REDUCE (Reduction of Uncertain Edges). Finally, I will demonstrate that by iterating TRaCE and REDUCE, we could resolve the underdetermined inference of gene regulatory networks. These tools are available from our website: http://www.cabsel.ethz.ch/tools.
Gurdal Yeliz Poster

Poster

MAT-05 Herringbone Reconstruction versus Adsorption Registry: Which One Matters More to Pyrphyrin Adsorption, Yeliz Gurdal (University of Zurich, Switzerland)

Co-Authors: Marcella Iannuzzi (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

Understanding molecule-metal interfaces is crucial in current technologies such as molecular electronics, magnetism, and photovoltaic cells. However, due to the complex nature of Au(111) surface which possesses herringbone reconstruction, the interactions between molecules and the reconstructed Au(111) surface are still unclear. To fill this fundamental gap in the literature we apply Density Functional Theory to investigate the effects of both herringbone reconstruction and adsorption registry on the electronic structure of Co-Pyrphyrin(CoPy)@Au(111) complex. We found that the type of van der Waals schemes is important to get accurate herringbone structure of the Au(111) surface. Adsorption of the molecule is stabilized by the interaction of both Co metal center and cyano groups with under-coordinated Au atoms. PDOS analysis reveals that CoPy@Au(111) interface is influenced more by changing adsorption registry of the molecule rather than changing adsorption domains on the herringbone reconstructed surface.
Gurnis Michael MS Presentation
Friday, June 10, 2016
Garden 1A, 09:00-09:30

MS Presentation

HPC Challenges Arising in Forward and Inverse Mantle Flow Simulation, Michael Gurnis (California Institute of Technology, United States of America)

Co-Authors: Johann Rudi (University of Texas at Austin, United States of America); Vishagan Ratnaswamy (California Institute of Technology, United States of America); Dunzhu Li (California Institute of Technology, United States of America); Tobin Isaac (University of Chicago, United States of America); Michael Gurnis (California Institute of Technology, United States of America); Omar Ghattas (University of Texas at Austin, United States of America)

We discuss scalable solvers for the forward and inverse simulation of mantle flow problems. Crucial solver components for the arising nonlinear Stokes problems are parallel multigrid methods for preconditioning the linearized Stokes system, and a Schur complement approximation that is able to cope with extreme viscosity variations. To achieve good parallel scalability, we use, among others, matrix-free operations and we redistribute coarse multigrid levels to a subsets of all available processors. We will discuss the inversion of global rheology parameters and distributed fields from surface data and the present-day temperature distribution in instantaneous and time-dependent problems.
Gurol Selime MS Presentation
Wednesday, June 8, 2016
Garden 3B, 17:00-17:30

MS Presentation

Numerical Solution of the Time-Parallelized Weak-Constraint 4DVAR, Selime Gurol (CERFACS, France)

Co-Authors: Serge Gratton (Institut de Recherche en Informatique de Toulouse / CERFACS, France); Mike Fisher (ECMWF, United Kingdom)

This study will address the numerical solution of the saddle point system arising from four dimensional variational (4D-Var) data assimilation, including a study of preconditioning and its convergence properties. This saddle point formulation of 4D-Var allows parallelization in time dimension. Therefore, it represents a crucial step towards higher computational efficiency, since 4D-Var approaches otherwise require many sequential computations. In recent years, there has been increasing interest in saddle point problems which arise in many other applications such as constrained optimisation, computational fluid dynamics, optimal control and so forth. The key issue of solving saddle point systems with Krylov subspace methods is to find efficient preconditioners. This work focuses on the preconditioners obtained by using limited memory low-rank updates and presents numerical results obtained from the Object Oriented Prediction System (OOPS) developed by ECMWF.
Gysi Tobias Poster

Poster

CSM-06 dCUDA: Hardware Supported Overlap of Computation and Communication, Tobias Gysi (ETH Zurich, Switzerland)

Co-Authors: Jeremia Bär (ETH Zurich, Switzerland); Torsten Hoefler (ETH Zurich, Switzerland)

In recent years, the CUDA programming model and underlying GPU hardware architecture have gained a lot of popularity in various application domains such as climate modelling, computational chemistry, and machine learning. Today, GPU cluster programming typically requires two different programming models that separately deal with on-node computation and inter-node communication. With dCUDA we present a unified GPU cluster programming model that implements device-side remote memory access operations with target notification. To hide instruction pipeline latencies, CUDA programs over-subscribe the hardware with many more threads than there are execution units. Whenever a thread stalls the hardware proceeds with another thread that is ready for execution. To make best use of the cluster interconnect, dCUDA applies the same latency hiding technique to automatically overlap on-node computation with inter-node communication. Our experiments demonstrate good and perfect overlap for compute-bound and memory-bound tasks respectively.

H

Hadjidoukas Panagiotis Paper
Wednesday, June 8, 2016
Auditorium C, 14:30-15:00

Paper

Approximate Bayesian Computation for Granular and Molecular Dynamics Simulations, Panagiotis Hadjidoukas (ETH Zurich, Switzerland)

Co-Authors: Panagiotis Angelikopoulos (ETH Zurich, Switzerland); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Costas Papadimitriou (University of Thessaly, Greece); Petros Koumoutsakos (ETH Zurich, Switzerland)

The effective integration of models with data through Bayesian uncertainty quantification hinges on the formulation of a suitable likelihood function. In many cases such a likelihood may not be readily available or it may be difficult to compute. The Approximate Bayesian Computation (ABC) proposes the formulation of a likelihood function through the comparison between low dimensional summary statistics of the model predictions and corresponding statistics on the data. In this work we report a computationally efficient approach to the Bayesian updating of Molecular Dynamics (MD) models through ABC using a variant of the Subset Simulation method. We demonstrate that ABC can also be used for Bayesian updating of models with an explicitly defined likelihood function, and compare ABC-SubSim implementation and efficiency with the transitional Markov chain Monte Carlo (TMCMC). ABC-SubSim is then used in force-field identification of MD simulations. Furthermore, we examine the concept of relative entropy minimization for the calibration of force fields and exploit it within ABC. Using different approximate posterior formulations, we showcase that assuming Gaussian ensemble fluctuations of molecular systems quantities of interest can potentially lead to erroneous parameter identification.
Paper
Wednesday, June 8, 2016
Auditorium C, 17:00-17:30

Paper

An Efficient Compressible Multicomponent Flow Solver for Heterogeneous CPU/GPU Architectures, Panagiotis Hadjidoukas (ETH Zurich, Switzerland)

Co-Authors: Babak Hejazialhosseini (Cascade Technologies Inc., United States of America); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Diego Rossinelli (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

We present a solver for three-dimensional compressible multicomponent flow based on the compressible Euler equations. The solver is based on a finite volume scheme for structured grids and advances the solution using an explicit Runge-Kutta time stepper. The numerical scheme requires the computation of the flux divergence based on an approximate Riemann problem. The computation of the divergence quantity is the most expensive task in the algorithm. Our implementation takes advantage of the compute capabilities of heterogeneous CPU/GPU architectures. The computational problem is organized in subdomains small enough to be placed into the GPU memory. The compute intensive stencil scheme is offloaded to the GPU accelerator while advancing the solution in time on the CPU. Our method to implement the stencil scheme on the GPU is not limited to applications in fluid dynamics. The performance of our solver was assessed on Piz Daint, a XC30 supercomputer at CSCS. The GPU code is memory-bound and achieves a per-node performance of 462 Gflop/s, outperforming by 3.2x the multicore-based Gordon Bell winning CUBISM-MPCF solver for the offloaded computation on the same platform. The focus of this work is on the per-node performance of the heterogeneous solver. In addition, we examine the performance of the solver across 4096 compute nodes. We present simulations for the shock-induced collapse of an aligned row of air bubbles submerged in water using 4 billion cells. Results show a final pressure amplification that is 100x stronger than the strength of the initial shock.
Haensel David MS Presentation
Thursday, June 9, 2016
Garden 2BC, 11:30-12:00

MS Presentation

How to Do Nothing in Less Time, David Haensel (Juelich Supercomputing Centre, Germany)

Co-Authors: David Haensel (Juelich Supercomputing Centre, Germany); Andreas Beckmann (Juelich Supercomputing Centre, Germany)

The Fast Multipole Method is a generic toolbox algorithm for many important scientific applications, e.g. molecular dynamics. It enables us to compute all long-range O(N^2) pairwise interactions for N particles in O(N) for any given precision. Unfortunately, the runtime of such simulations is already communication bound. To increase the performance on modern HPC hardware, a more sophisticated parallelization scheme is required. Especially the reduction of MPI collectives is a vital issue to increase the strong scaling. In this talk we will focus exclusively on the internode communication via MPI. We will present a latency-avoiding communication scheme and its implementation for our C++11 FMM toolbox. The implementation consists of several layers of abstraction to hide/encapsulate low level MPI calls and specifics of the communication algorithm. We will also show examples of the scaling capabilities of the FMM on a BG/Q for small and medium size MD problems.
Hager Georg MS Presentation
Thursday, June 9, 2016
Auditorium C, 15:00-15:30

MS Presentation

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems, Georg Hager (Universität Erlangen-Nürnberg, Germany)

Co-Authors: Georg Hager (University of Erlangen-Nuremberg, Germany); Gerhard Wellein (University of Erlangen-Nuremberg, Germany)

A significant amount of future exascale-class high performance computer systems are projected to be of heterogeneous nature, featuring "standard" as well as "accelerated" resources. A software infrastructure that claims applicability for such systems must be able to meet their inherent challenges: multiple levels of parallelism, complex topologies, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is an open-source library of building blocks for sparse linear algebra algorithms on current and future large-scale systems. Being built on the "MPI+X" paradigm, it provides truly heterogeneous data parallelism and a light-weight and affinity-aware tasking mechanism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. Important design decisions are described with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models.
Halpern Federico David Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Federico David Halpern (Ecole Polytechnique Fédérale de Lausanne (EPFL), Swiss Plasma Center (SPC), CH-1, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Ham David A. MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, David A. Ham (Imperial College London, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Hapla Vaclav Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 10:50-11:10

Contributed Talk

PERMON Libraries for Massively Parallel Solution of Contact Problems of Elasticity, Vaclav Hapla (IT4Innovations, VSB - Technical University of Ostrava, Czech Republic)

Co-Authors: David Horak (Technical University of Ostrava / IT4Innovations, Czech Republic)

PERMON forms a collection of software libraries, uniquely combining quadratic programming (QP) algorithms and domain decomposition methods (DDM), built on top of the well-known PETSc framework for numerical computations. Among the main applications are contact problems of mechanics. Our PermonFLLOP package is focused on non-overlapping DDM of the FETI type, allowing efficient and robust utilization of contemporary parallel computers for problems with billions of unknowns. Any FEM software can be used to generate mesh and assemble the stiffness matrices and load vectors per each subdomain independently. Additionally, a mapping from the local to the global numbering of degrees of freedom is needed, and non-penetration and friction information in case of contact problems. All these data are passed to PermonFLLOP, which prepares auxiliary data needed in the DDM. PermonQP is then called in the backend to solve the resulting equality constrained problem with additional inequality constraints in case of contact problems.
Harenberg Daniel MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:30-15:45

MS Presentation

Uncertainty Quantification and Global Sensitivity Analysis for Economic Models, Daniel Harenberg (ETH Zurich, Switzerland)

Co-Authors: Viktor Winschel (ETH Zurich, Switzerland); Stefano Marelli (ETH Zurich, Switzerland); Bruno Sudret (ETH Zurich, Switzerland)

We present a method for global sensitivity analysis of the outcomes of an economic model with respect to their parameters. Traditional sensitivity analyses, like comparative statics, scenario and robustness analysis are local and depend on the chosen combination of parameter values. Our global approach specifies a distribution for each parameter and approximates the outcomes as a polynomial of parameters. In contrast to local analyses, the global sensitivity analysis takes into account non-linearities and interactions. Using the polynomial, we compute the distribution of outcomes and a variance decomposition called Sobol' indices. We obtain an importance ranking of the parameters and their interactions, which can guide calibration exercises and model development. We compare the local to the global approach for the mean and variance of production in a canonical real business cycle model. We find an interesting separation result: for mean production, only capital share, leisure substitution rate, and depreciation rate matter.
Hariri Farah Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Farah Hariri (European Organization for Nuclear Research, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Farah Hariri (European Organization for Nuclear Research, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.
Hasanuddin Hasanuddin MS Presentation
Thursday, June 9, 2016
Garden 3C, 14:50-15:05

MS Presentation

New Time Step Criterion in Gravitational N-Body Simulations, Hasanuddin Hasanuddin (Physics and Astronomy Department, University of Leicester, United Kingdom)

Co-Authors: Walter Dehnen (University of Leicester, United Kingdom)

We present a new time step criterion for gravitational N-body simulations that is based on the norm of gradient tensor of the acceleration that is not gauge invariance in all circumstance and based on the orbital time of particle. We have tested this time step criterion in the simulation of single orbit with high eccentricity as well as N-Body problem using direct summation force calculation and a Plummer model as the initial condition. This time step criterion requires fewer force evaluation than other time step criteria.
Haveraaen Magne MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 13:30-14:00

MS Presentation

Abstractions for PDEs, Magne Haveraaen (University of Bergen, Switzerland)

Co-Authors:

The dominant abstraction in partial differential equations (PDEs) solvers is the array. From a computational viewpoint this makes sense: the representation theorem of PDEs ensures that any data component can be represented as an array. From a software engineering viewpoint this is not so clearcut. At the higher abstraction level, the array abstraction is not aligned with the concepts of the PDE domain. The effect is a lack of composition and reuse properties, manifest in the need to redevelop code from scratch when equations or assumptions change in unanticipated ways. At the lower level, the array abstraction is tied to a single core, uniform access memory model. The newer PGAS (partitioned global address space) model handles symmetric multicore architectures, but falls short of heterogeneous hardware (GPU, FPGA). The presentation will exemplify some of these problems, and sketch some solutions in the form of more appropriate abstractions.
Hazel Andrew L. MS Presentation
Thursday, June 9, 2016
Garden 2A, 11:30-12:00

MS Presentation

Multiple Solutions in Free-Surface Flows, Andrew L. Hazel (The University of Manchester, United Kingdom)

Co-Authors:

Methods for the (MPI-based) parallel numerical solution of free-surface flow problems using an ALE-based finite-element method will be presented. The deforming fluid domain is treated as a pseudo-elastic solid for small deformations, but can also be completely remeshed to handle extreme changes. Techniques for the continuation of solution branches in the presence of remeshing will be described and used to demonstrate the existence of new solutions for the canonical problems of viscous fluid flow on the outside or inside of rotating cylinders, and to quantify the accuracy of thin-film approximations. The same techniques will also be used to characterise the solution structure that develops for two-phase flow in partially-occluded Hele-Shaw cells. That system becomes more sensitive as the aspect ratio is increased in the sense that multiple solutions are provoked for smaller occlusions, which is conjectured to underlie the experimentally observed sensitivity of such cells at large aspect ratios.
Hejazialhosseini Babak Paper
Wednesday, June 8, 2016
Auditorium C, 17:00-17:30

Paper

An Efficient Compressible Multicomponent Flow Solver for Heterogeneous CPU/GPU Architectures, Babak Hejazialhosseini (Cascade Technologies, Inc., United States of America)

Co-Authors: Babak Hejazialhosseini (Cascade Technologies Inc., United States of America); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Diego Rossinelli (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

We present a solver for three-dimensional compressible multicomponent flow based on the compressible Euler equations. The solver is based on a finite volume scheme for structured grids and advances the solution using an explicit Runge-Kutta time stepper. The numerical scheme requires the computation of the flux divergence based on an approximate Riemann problem. The computation of the divergence quantity is the most expensive task in the algorithm. Our implementation takes advantage of the compute capabilities of heterogeneous CPU/GPU architectures. The computational problem is organized in subdomains small enough to be placed into the GPU memory. The compute intensive stencil scheme is offloaded to the GPU accelerator while advancing the solution in time on the CPU. Our method to implement the stencil scheme on the GPU is not limited to applications in fluid dynamics. The performance of our solver was assessed on Piz Daint, a XC30 supercomputer at CSCS. The GPU code is memory-bound and achieves a per-node performance of 462 Gflop/s, outperforming by 3.2x the multicore-based Gordon Bell winning CUBISM-MPCF solver for the offloaded computation on the same platform. The focus of this work is on the per-node performance of the heterogeneous solver. In addition, we examine the performance of the solver across 4096 compute nodes. We present simulations for the shock-induced collapse of an aligned row of air bubbles submerged in water using 4 billion cells. Results show a final pressure amplification that is 100x stronger than the strength of the initial shock.
Hesthaven Jan Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 10:30-10:50

Contributed Talk

Space-Time Parallelism for Hyperbolic PDEs, Jan Hesthaven (EPFL, Switzerland)

Co-Authors: Gilles Brunner (EPFL, Switzerland); Jan Hesthaven (EPFL, Switzerland)

Parallel-in-time integration techniques have been hailed as a potential paths to exascale for the solution of evolution type problems. Methods of time-parallel integration are intended to extend parallel scaling on compute clusters beyond what is possible using conventional domain decomposition techniques alone. In this talk we give a short introduction to space-time parallelism with emphasis on the parareal method. We then proceed to present resent advances in the construction of the coarse operator needed in the iterative correction scheme. The modifications allow for parallel-in-time acceleration of purely hyperbolic systems of partial differential equations, something previously widely considered impractical. The talk is concluded with a presentation of preliminary results on parallel-in-time integration of a two-dimensional shallow-water-wave equation that governs the underlying dynamics in a tsunami simulation application.
MS Summary

MS Summary

MS27 CADMOS: HPC Simulations, Modeling and Large Data, Jan Hesthaven (EPFL, Switzerland)

Co-Authors: Nicolas Salamin (University of Lausanne, Switzerland), Jan Hesthaven (EPFL, Switzerland)

CADMOS (Center for ADvance MOdelling Science) is a partnership between UNIGE, UNIL and EPFL whose goal is to promote HPC, modelling and simulation techniques, and data science for a broad range of relevant applications. New scientific results for well established HPC problems, or new methodological approaches to problems usually not solved by computer modelling or HPC resources are especially considered. In this minisymposium we will have presentations from each of the three partners, highlighting the above goals. We will also invite two external keynote speakers. Contributions reporting on the link between HPC and data science, or opening the door to new interdisciplinary applications within the scope of CADMOS are welcome.
Hinojosa Alfredo Parra MS Presentation
Thursday, June 9, 2016
Garden 2BC, 12:00-12:30

MS Presentation

Fault Tolerance and Silent Fault Detection for Higher-Dimensional Discretizations, Alfredo Parra Hinojosa (Technische Universität München, Germany)

Co-Authors: Alfredo Parra Hinojosa (Technische Universität München, Germany)

Future exascale systems are expected to have a mean time between failure in the range of minutes. Classical approaches such as checkpointing and then recomputing the missing solution will be therefore out of scope. Algorithm-based fault tolerance in contrast aims to continue without recomputations and with only minor extra computational effort. Therefore, numerical schemes have to be adapted. We present algorithm-based fault tolerance for the solution of high-dimensional PDEs. They exploit a hierarchical extrapolation scheme, the sparse grid combination technique. Using the hierarchical ansatz, we show how hard faults can be mitigated without checkpoint-restart. Furthermore we explain how even soft faults (for example due to silent data corruption) can often be detected and handled.
Hiptmair Ralf Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:50-12:10

Contributed Talk

Tensor-Product Discretization for the Spatially Inhomogeneous and Transient Boltzmann Equation, Ralf Hiptmair (SAM ETHZ, Switzerland)

Co-Authors: Philipp Grohs (ETH Zurich, Switzerland); Ralf Hiptmair (ETH Zurich, Switzerland)

The Boltzmann equation provides a fundamental mesoscopic model for the dynamics of rarefied gases. The computational challenge arising from it's discretization is twofold: we face a moderately high-dimensional problem and the collision operator is non-linear and non-local in the velocity variable. We aim for a deterministic and asymptotically exact Galerkin discretization. This sets our approach apart from stochastic Monte-Carlo-type and Fourier based methods. We consider a tensor product discretization of the distribution function combining Laguerre polynomials times a Maxwellian in velocity with continuous first-order finite elements in space. Unlike the Fourier spectral methods, our approach does not require truncation of the velocity domain and it does not suffer from aliasing errors. The advection problem is discretized through a Galerkin least-squares technique and yields an implicit formulation in time. Numerical results of benchmark simulations in 2+2 dimensions will be presented.
Hirstoaga Sever MS Presentation
Wednesday, June 8, 2016
Garden 3C, 17:15-17:30

MS Presentation

Particle-in-Cell Simulations for Vlasov-Poisson Models, Sever Hirstoaga (Inria, France)

Co-Authors:

We study the dynamics of charged particles under the influence of a strong magnetic field by numerically solving the Vlasov-Poisson and guiding center models. We implement an efficient (from the memory access point of view) particle-in-cell method which enables simulations with a large number of particles. We present numerical results for classical Landau damping and Kelvin-Helmholtz test cases. The implementation also relies on a standard hybrid MPI/OpenMP parallelization. Code performance is assessed by the observed speedup and attained memory bandwidth.
Hodel Florian Poster

Poster

MAT-10 What Influences the Water Oxidation Activity of a Bio-Inspired Molecular CoII4O4 Cubane?, Florian Hodel (University of Zurich, Switzerland)

Co-Authors: Sandra Luber (University of Zurich, Switzerland)

We investigated the reaction mechanism of the recently presented first Co(II)-based WOC, [CoII4(hmp)4(μ-OAc)2(μ2-OAc)2(H2O)2] (hmp=2-(hydroxymethyl)pyridine), which is one of the rare stable homogeneous cubane-type water oxidation catalysts (WOCs) and the design of which has been inspired by nature's oxygen evolving complex of photosystem II (PSII). Two possible different catalytic cycles have been envisioned: A single-site pathway involving only one cobalt center and a water attack on an oxo ligand or, alternatively, an oxo-oxo coupling pathway where two terminal oxo ligands of the cubane couple and are released as O2. Using density functional theory and an explicit first solvation shell, we compare relative free energies of all catalytic states and analyze their stability and reactivity. Furthermore, we compute barriers and reaction paths for the water attack and O2 release steps. With this knowledge at hand, we propose possibilities to tune catalytic activity paving the way to informed design of high-performance PSII mimics.
Hoefler Torsten MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:15-15:30

MS Presentation

Refactoring and Virtualizing a Mesoscale Model for GPUs, Torsten Hoefler (ETH Zurich, Switzerland)

Co-Authors: Andrea Arteaga (MeteoSwiss, Switzerland); Christophe Charpilloz (MeteoSwiss, Switzerland); Salvatore Di Girolamo (ETH Zurich, Switzerland); Torsten Hoefler (ETH Zurich, Switzerland)

Our aim is to adopt the COSMO limited-area model to enable kilometer-scale resolution in climate simulation mode. As the resolution of climate simulations increases, storing the large amount of generated data becomes infeasible. To enable high-resolution models, we find a good compromise between the disk I/O costs and the need to access the output data for post-processing and analysis. We propose a data-virtualization layer that re-runs simulations on demand and transparently manages the data for the analytics applications. To achieve this goal, we developed a bit-reproducible version of the dynamical core of the COSMO model that runs on different architectures (e.g., CPUs and GPUs). An ongoing project is working on the reproducibility of the full COSMO code. We will discuss the strategies adopted to develop the data virtualization layer, the challenges associated with the reproducibility of simulations performed on different hardware architectures and the first promising results of our project.
Poster

Poster

CSM-06 dCUDA: Hardware Supported Overlap of Computation and Communication, Torsten Hoefler (ETH Zurich, Switzerland)

Co-Authors: Jeremia Bär (ETH Zurich, Switzerland); Torsten Hoefler (ETH Zurich, Switzerland)

In recent years, the CUDA programming model and underlying GPU hardware architecture have gained a lot of popularity in various application domains such as climate modelling, computational chemistry, and machine learning. Today, GPU cluster programming typically requires two different programming models that separately deal with on-node computation and inter-node communication. With dCUDA we present a unified GPU cluster programming model that implements device-side remote memory access operations with target notification. To hide instruction pipeline latencies, CUDA programs over-subscribe the hardware with many more threads than there are execution units. Whenever a thread stalls the hardware proceeds with another thread that is ready for execution. To make best use of the cluster interconnect, dCUDA applies the same latency hiding technique to automatically overlap on-node computation with inter-node communication. Our experiments demonstrate good and perfect overlap for compute-bound and memory-bound tasks respectively.
Poster

Poster

CSM-10 Polly-ACC Transparent Compilation to Heterogeneous Hardware, Torsten Hoefler (ETH Zurich, Switzerland)

Co-Authors: Torsten Hoefler (ETH Zurich, Switzerland)

Sequential programs compiled for today's heterogeneous hardware often exploit only a small fraction of available compute resources. To benefit of GPU accelerators the use of explicit parallel programming languages, pragma annotation systems, or specialized code generators is commonly necessary. We address the problem of automatically generating GPU code by developing a newly integrated heterogeneous compute compiler which - using latest polyhedral modeling techniques - automatically maps sequential programs to accelerators. For a range of applications we observe almost no performance regressions. On top of this baseline, we report performance improvements for a multiple computer kernels as well as two application benchmarks from SPEC CPU 2006.
MS Summary

MS Summary

MS20 Kilometer-Scale Weather and Climate Modeling on Future Supercomputing Platforms, Torsten Hoefler (ETH Zurich, Switzerland)

Co-Authors: Torsten Hoefler (ETH Zurich, Switzerland)

The development of weather and climate models has made rapid progress in recent years. With the advent of high-performance computing (HPC), the computational resolution will continue to be refined in the next decades. This development offers exciting prospects. From a climate science perspective, a further increase in resolution will make it possible to explicitly represent the dynamics of deep convective and thunderstorm clouds without the help of semi-empirical parameterizations. From a computer science perspective, this strategy poses major challenges. First, emerging hardware architectures increasingly involve the use of heterogeneous many-core architectures consisting of both CPUs and accelerators (e.g., GPUs). The efficient exploitation of such architectures requires a paradigm shift and has only just started. Second, with increasing computational resolution, the models' output becomes unbearably voluminous, the delay from I/O unacceptably large, and long-term storage prohibitively expensive. Ultimately, there is no way around conducting the analyses online rather than storing the model output. This approach implies conducting model reruns (i.e., repeat simulations for refined analysis). These developments pose new challenging computer science questions, which need to be addressed before an efficient exploitation of new hardware systems becomes feasible. The proposed minisymposium is designed as an interdisciplinary workshop between climate and computer scientists, and its overall scope is the further development of high-resolution climate models. Specific aspects to be addressed include the numerical and computational formulation of non-hydrostatic dynamical models on heterogeneous next-generation many-core hardware architectures, the use of novel online analyses methods and model reruns in extended simulations, the virtualization of climate model simulations, as well as the development of bit-reproducible codes across different hardware architectures. Two of the listed presentations (those of Oliver Fuher and David Leutwyler) are centered around a recently developed GPU-enabled version of the COSMO model. This limited-area model is probably the first full atmospheric model that runs entirely on GPUs. It is currently evaluated in a pre-operational test suite for numerical weather prediction, and has already been used for European-scale decade-long climate simulations. Further development and exploitation of the model in a climate setting is currently being undertaken within the project crCLIM (http://www.c2sm.ethz.ch/research/crCLIM). The two organizers of this minisymposium are involved in this project.
Hoermann Julia M. MS Presentation
Wednesday, June 8, 2016
Garden 3A, 15:30-16:00

MS Presentation

Hybridizable Discontinuous Galerkin Approximation of Cardiac Electrophysiology, Julia M. Hoermann (Technical University of Munich, Germany)

Co-Authors: Cristóbal Bertoglio (Center for Mathematical Modeling, University of Chile, Chile); Martin Kronbichler (Technical University of Munich, Germany); Wolfgang A. Wall (Technical University of Munich, Germany)

Cardiac electrophysiology simulations are numerically extremely challenging, due to the propagation of the very steep electrochemical wave front during depolarization. Hence, in classical continuous Galerkin (CG) approaches, very small temporal and spacial discretisations are necessary to obtain physiological propagation. Until now, spatial discretisations based on discontinuous methods have received little attention for cardiac electrophysiology simulations. In particular, local discontinuous Galerkin (LDG) or hybridizable discontinuous Galerkin (HDG) methods have not been explored yet. Application of such methods, when taking advantage of their parallelity features, would allow a speed-up of the computations. In this work we provide a detailed comparison among CG, LDG and HDG methods for electrophysiology equations based on the mono-domain model. We also study the effect of the numerical integration of the non-linear ionic current term. Furthermore we plan to show the difference between classic CG methods and HDG methods on large three-dimensional simulations with patient-specific cardiac geometries.
Hommes Gregg MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:00-14:30

MS Presentation

The Development of ParaDiS for HCP Crystals, Gregg Hommes (Lawrence Livelarmore National Laboratory, United States of America)

Co-Authors: Sylvie Aubry (Lawrence Livermore National Laboratory, United States of America); Moono Rhee (Lawrence Livermore National Laboratory, United States of America); Brett Wayne (Lawrence Livermore National Laboratory, United States of America); Gregg Hommes (Lawrence Berkeley National Laboratory, United States of America)

The ParaDiS project at LLNL was created to build a scalable massively parallel code for the purpose of predicting evolution of strength and strain hardening and crystalline materials under dynamic loading conditions by directly integrating the elements of dislocation physics. The code has been used by researchers at LLNL and around the world to simulate the behaviour of dislocation networks in a wide variety of applications, from high temperature structural materials, to nuclear materials, to armor materials, to photovoltaic systems. ParaDiS has recently been extended to include a fast analytical algorithm for the computation of forces in anisotropic elastic media, and an augmented set of topological operations to treat the complex core physics of dislocations and other dislocations that routinely appear in HCP metals. The importance and implications of these developments on the engineering properties of HCP metals will be demonstrated in large scale simulations of strain hardening.
Homolya Miklós MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, Miklós Homolya (Imperial College London, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Horak David Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:30-11:50

Contributed Talk

The Energy Consumption Optimization of the FETI Solver, David Horak (VSB-Technical University of Ostrava, IT4Innovations, Czech Republic)

Co-Authors: Lubomir Riha (IT4Innovations National Supercomputing Center, Czech Republic); Radim Sojka (IT4Innovations National Supercomputing Center, Czech Republic); Jakub Kruzik (IT4Innovations National Supercomputing Center, Czech Republic); Martin Beseda (IT4Innovations National Supercomputing Center, Czech Republic)

The presentation deals with the energy consumption evaluation of the FETI method blending iterative and direct solvers in the scope of READEX project. The measured characteristics on model cube benchmark illustrate the behaviour of preprocessing and solve phases related mainly to the CPU frequency, different problem decompositions, compiler's type and compiler's parameters. In preprocessing it is necessary to factorize the stiffness and coarse problem matrices, which belongs to the most time and also energy consuming operations. The solve employs the conjugate gradient algorithm and consists of sparse matrix-vector multiplications and vector dot products or AXPY functions. In each iteration we need to apply direct solver twice for pseudo-inverse action and coarse problem solution. All these operations cover together the basic Sparse and Dense BLAS Level 1, 2 and 3 routines, so that we can explore their different dynamism and dynamic switching between various configurations can then provide significant energy savings.
Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 10:50-11:10

Contributed Talk

PERMON Libraries for Massively Parallel Solution of Contact Problems of Elasticity, David Horak (VSB-Technical University of Ostrava, IT4Innovations, Czech Republic)

Co-Authors: David Horak (Technical University of Ostrava / IT4Innovations, Czech Republic)

PERMON forms a collection of software libraries, uniquely combining quadratic programming (QP) algorithms and domain decomposition methods (DDM), built on top of the well-known PETSc framework for numerical computations. Among the main applications are contact problems of mechanics. Our PermonFLLOP package is focused on non-overlapping DDM of the FETI type, allowing efficient and robust utilization of contemporary parallel computers for problems with billions of unknowns. Any FEM software can be used to generate mesh and assemble the stiffness matrices and load vectors per each subdomain independently. Additionally, a mapping from the local to the global numbering of degrees of freedom is needed, and non-penetration and friction information in case of contact problems. All these data are passed to PermonFLLOP, which prepares auxiliary data needed in the DDM. PermonQP is then called in the backend to solve the resulting equality constrained problem with additional inequality constraints in case of contact problems.
Horenko Illia MS Presentation
Thursday, June 9, 2016
Garden 2A, 14:30-15:00

MS Presentation

Causality Inference in a Nonstationary and Nonhomogenous Framework, Illia Horenko (Universita della Svizzera italiana, Lugano, Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland); Lukas Pospisil (Università della Svizzera italiana, Switzerland)

The project deploys statistical and computational techniques to develop a novel approach to causality inference in multivariate time-series of economical data on equity and credit risks. The methods build on recent research of project participants. They improve on classical approaches to causality analysis by accommodating general forms of non-stationarity and non-homogeneity resulting from unresolved and latent scale effects. Emerging causality framework results in and is implemented through a clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. We are using finite element framework to propose a numerical scheme. One of the most challenging components of the emerging HPC implementation is a Quadratic Programming problem with linear equality and bound inequality constraints. We compare different algorithms and demonstrate the efficiency solving practical benchmark problems.
Poster

Poster

CSM-13 Towards the HPC-Inference of Causality Networks from Multiscale Economical Data, Illia Horenko (Universita della Svizzera italiana, Lugano, Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); Patrick Gagliardini (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland)

The novel non-stationary approach to causality inference of multivariate time-series was proposed during the recent research of project participants. This methodology uses the clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. For analysis of realistic datasets we develop HPC library that is built on top of PETSc and that implements MPI, OpenMP, and CUDA parallelization strategies. We present the mathematical aspects of the methodology and preliminary results of solving the non-stationary problem of causality inference for multivariate economic data with our HPC approach. The results are computed on Piz Daint at CSCS.
Horlacher Oliver MS Presentation
Thursday, June 9, 2016
Garden 3A, 15:30-15:45

MS Presentation

Large-Scale Mass Spectrometry Data Analysis, Oliver Horlacher (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Frederique Lisacek (Swiss Institute of Bioinformatics, Switzerland); Markus Müller (Swiss Institute of Bioinformatics, Switzerland)

The purpose of this talk is to highlight the design, development and use of a java library supporting Hadoop MapReduce and Apache Spark cluster calculations for large-scale analysis of mass spectrometry data. While being noisy, redundant and ambiguous, the latter is generated by the many millions and contains key information for identifying active small and large molecules in complex biological samples. The library favours the fast and flexible implementation of customised analytical pipelines.
Huebl Axel MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Axel Huebl (Helmholtz-Zentrum Dresden - Rossendorf, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Hutter Juerg Poster

Poster

MAT-09 Sparse Matrix Multiplication Library for Linear Scaling DFT Calculations in Electronic Structure Codes, Juerg Hutter (UZH, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Andreas Glöss (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

The key operation for linear scaling DFT implemented in the CP2K quantum chemistry program is sparse matrix-matrix multiplication. For such a task, the sparse matrix library DBCSR (Distributed Block Compressed Sparse Row) has been developed. DBCSR takes full advantage of the block-structured sparse nature of the matrices for efficient computation and communication. It is MPI and OpenMP parallelized, and can exploit accelerators. We describe a strategy to improve DBCSR performance. DBCSR is available as a stand alone library at http://dbcsr.cp2k.org/ to be employed in electronic structure codes. To this end a streamlined API has been defined and a suite of tools has been developed to generate the full documentation of the library (API-DOC) by extracting the information provided directly in the source code. We give a flavour of the generated API-DOC by showing snapshots of selected HTML documentation pages and we sketch the design of such tools.
Poster

Poster

MAT-04 CP2K within the PASC Materials Network, Juerg Hutter (UZH, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Hans Pabst (Intel Semiconductor AG, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

One of the goals of the PASC project is to strengthen the networking in the Swiss material science community through active development of collaborative relationships among University researchers and CSCS staff. This includes assisting researchers in tuning, debugging, optimizing, and enhancing codes and applications for HPC resources, from mid-scale to national and international petascale facilities, with a view to the exascale transition. In addition, the application support specialists provide support for development projects on software porting techniques, parallelization and optimization strategies, deployment on diverse computational platforms, and data management. Here we present selected tools and software developed for CP2K [1]. Furthermore we show exemplary how a CP2K application can be tuned to optimally use all available HPC resources. With a view to the next-generation HPC hardware, we present first promising performance results for INTEL's Broadwell-EP and KNL platform. [1] The CP2K developers group, CP2K is freely available from: https://www.cp2k.org/, 2016
Poster

Poster

MAT-05 Herringbone Reconstruction versus Adsorption Registry: Which One Matters More to Pyrphyrin Adsorption, Juerg Hutter (UZH, Switzerland)

Co-Authors: Marcella Iannuzzi (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

Understanding molecule-metal interfaces is crucial in current technologies such as molecular electronics, magnetism, and photovoltaic cells. However, due to the complex nature of Au(111) surface which possesses herringbone reconstruction, the interactions between molecules and the reconstructed Au(111) surface are still unclear. To fill this fundamental gap in the literature we apply Density Functional Theory to investigate the effects of both herringbone reconstruction and adsorption registry on the electronic structure of Co-Pyrphyrin(CoPy)@Au(111) complex. We found that the type of van der Waals schemes is important to get accurate herringbone structure of the Au(111) surface. Adsorption of the molecule is stabilized by the interaction of both Co metal center and cyano groups with under-coordinated Au atoms. PDOS analysis reveals that CoPy@Au(111) interface is influenced more by changing adsorption registry of the molecule rather than changing adsorption domains on the herringbone reconstructed surface.

I

Iannuzzi Marcella Poster

Poster

MAT-05 Herringbone Reconstruction versus Adsorption Registry: Which One Matters More to Pyrphyrin Adsorption, Marcella Iannuzzi (University of Zurich, Switzerland)

Co-Authors: Marcella Iannuzzi (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

Understanding molecule-metal interfaces is crucial in current technologies such as molecular electronics, magnetism, and photovoltaic cells. However, due to the complex nature of Au(111) surface which possesses herringbone reconstruction, the interactions between molecules and the reconstructed Au(111) surface are still unclear. To fill this fundamental gap in the literature we apply Density Functional Theory to investigate the effects of both herringbone reconstruction and adsorption registry on the electronic structure of Co-Pyrphyrin(CoPy)@Au(111) complex. We found that the type of van der Waals schemes is important to get accurate herringbone structure of the Au(111) surface. Adsorption of the molecule is stabilized by the interaction of both Co metal center and cyano groups with under-coordinated Au atoms. PDOS analysis reveals that CoPy@Au(111) interface is influenced more by changing adsorption registry of the molecule rather than changing adsorption domains on the herringbone reconstructed surface.
Iga Shin-ichi Paper
Wednesday, June 8, 2016
Auditorium C, 14:00-14:30

Paper

Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5, Shin-ichi Iga (RIKEN AICS, Japan)

Co-Authors: Masaaki Terai (RIKEN / Advanced Institute for Computational Science, Japan); Ryuji Yoshida (RIKEN / Advanced Institute for Computational Science, Japan); Shin-ichi Iga (RIKEN / Advanced Institute for Computational Science, Japan); Kazuo Minami (RIKEN / Advanced Institute for Computational Science, Japan); Hirofumi Tomita (RIKEN / Advanced Institute for Computational Science, Japan)

We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
Ippolito Sonia MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:45-17:00

MS Presentation

Computational Study of the Risk of Restenosis in Coronary Bypasses, Sonia Ippolito (Ospedale L. Sacco, Italy)

Co-Authors: Christian Vergara (Politecnico di Milano, Italy); Sonia Ippolito (Ospedale Luigi Sacco Milano, Italy); Roberto Scrofani (Ospedale Luigi Sacco Milano, Italy); Alfio Quarteroni (EPFL, Switzerland)

Coronary artery disease, caused by the build-up of atherosclerotic plaques in coronary vessel walls, is one of the leading causes of death in the world. For high-risk patients, coronary artery bypass graft is the preferred treatment. Despite overall excellent patency rates, bypasses may fail due to restenosis. In this context, we present a computational study of the fluid-dynamics in patient-specific geometries with the aim of investigating a possible relationship between coronary stenosis and graft failure. Firstly, we propose a strategy to prescribe realistic boundary conditions in absence of measured data, based on an extension of Murray's law to provide the flow division at bifurcations in case of stenotic vessels and non-Newtonian blood rheology. Then, we show some results regarding numerical simulations in patients treated with grafts, in which the degree of coronary stenosis is virtually varied to compare the fluid-dynamics in terms of hemodynamic indices potentially involved in restenosis development.
Isaac Tobin MS Presentation
Friday, June 10, 2016
Garden 1A, 09:00-09:30

MS Presentation

HPC Challenges Arising in Forward and Inverse Mantle Flow Simulation, Tobin Isaac (University of Chicago, United States of America)

Co-Authors: Johann Rudi (University of Texas at Austin, United States of America); Vishagan Ratnaswamy (California Institute of Technology, United States of America); Dunzhu Li (California Institute of Technology, United States of America); Tobin Isaac (University of Chicago, United States of America); Michael Gurnis (California Institute of Technology, United States of America); Omar Ghattas (University of Texas at Austin, United States of America)

We discuss scalable solvers for the forward and inverse simulation of mantle flow problems. Crucial solver components for the arising nonlinear Stokes problems are parallel multigrid methods for preconditioning the linearized Stokes system, and a Schur complement approximation that is able to cope with extreme viscosity variations. To achieve good parallel scalability, we use, among others, matrix-free operations and we redistribute coarse multigrid levels to a subsets of all available processors. We will discuss the inversion of global rheology parameters and distributed fields from surface data and the present-day temperature distribution in instantaneous and time-dependent problems.
Iwashita Hidetoshi MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:00-14:15

MS Presentation

Omni Compiler and XcodeML: An Infrastructure for Source-to-Source Transformation, Hidetoshi Iwashita (AICS,A RAIKEN, Japan)

Co-Authors: Hitoshi Murai (AICS, RIKEN, Japan); Masahiro Nakao (AICS, RIKEN, Japan); Hidetoshi Iwashita (AICS, RIKEN, Japan); Jinpil Lee (AICS, RIKEN, Japan); Akihiro Tabuchi (University of Tsukuba, Japan)

We have been developing a compiler of PGAS programming language called XcalableMP for post-petascale computing. XcalableMP is a directive-based language extension of Fortran95 and C for scientific programming for high-performance distributed memory parallel systems. Omni Compiler is an infrastructure for source-to-source transformation to design source-to-source compilers such as Omni XcalableMP compiler. It includes C and Fortran95 front-ends which translate a source code to XML-based intermediate code called XcodeML, a Java-based code-transformation library on XcodeML, and the de-compilers which translate XcodeML intermediate code back to transformed source code. Currently, the Omni compiler also supports the code transformation for OpenMP and OpenACC. In this talk, we will present internals of Omni compiler by taking Omni XcalableMP compiler as a case study, and our future plan.

J

Jafary-Zadeh Mehdi MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:30-15:00

MS Presentation

Deformation and Failure Behavior of Metallic Nanostructures, Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore, Singapore)

Co-Authors: Zhaoxuan Wu (Institute of High Performance Computing, A*STAR, Singapore); Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore); Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore); Wendy Gu (California Institute of Technology, United States of America); Julia Greer (California Institute of Technology, United States of America); David Srolovitz (University of Pennsylvania, United States of America)

The reliability of metallic nanostructures, such as, nanowires and thin films, is often dictated by their mechanical failures. An in-depth understanding in the effects of their intrinsic factors such as grain boundaries and surface roughness and extrinsic factors such as size, shapes and man-made notches on their plastic deformation mechanisms and failure patterns is of great importance in fabricating these nanostructures with high reliability. In this talk, we will first briefly review various deformation mechanisms and failure patterns in literature and highlight some of critical issues that currently are under active research. We will then report our recent progresses in the study of the effects of intrinsic factors, such as grain boundaries and dislocations and extrinsic factors such as sizes, shapes and man-made notches, on the plasticity and failure of metallic nanostructures using both mechanical testing and large-scale molecular dynamics simulations.
Jahanbakhsh Ebrahim MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:45-15:00

MS Presentation

GPU-Accelerated Hydrodynamic Simulation of Hydraulic Turbines Using the Finite Volume Particle Method, Ebrahim Jahanbakhsh (Università della Svizzera Italiana, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL / LMH, Switzerland); Christian Vessaz (EPFL / LMH, Switzerland); Sebastian Leguizamon (EPFL / LMH, Switzerland); François Avellan (EPFL / LMH, Switzerland)

Performance prediction based on numerical simulations can be very helpful in the design process of hydraulic turbines. The Finite Volume Particle Method (FVPM) is a consistent and conservative particle-based method which inherits interesting features of both Smoothed Particle Hydrodynamics and grid-based Finite Volume Method. This method is particularly well-suited for such simulations thanks to its versatility. SPHEROS is a parallel FVPM solver which has been developed at the EPFL - Laboratory for Hydraulic Machines for simulating Pelton turbines and silt erosion. In order to allow the simulation of industrial-size setups, a GPU version of SPHEROS (GPU-SPHEROS) is being developed in CUDA and features Thrust library to handle complicated structures such as octree. Besides, some highly-optimised kernels are also implemented for both compute-bound and memory-bound algorithms. Comparing the performance of different parts of GPU-SPHEROS and SPHEROS, we achieve a speed-up factor of at least eight.
Contributed Talk
Friday, June 10, 2016
Garden 1BC, 10:15-10:30

Contributed Talk

Thermomechanical Modeling of Impacting Particles on a Metallic Surface for the Erosion Prediction in Hydraulic Turbines, Ebrahim Jahanbakhsh (Università della Svizzera Italiana, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL, Switzerland); Christian Vessaz (EPFL, Switzerland); François Avellan (EPFL, Switzerland)

Erosion damage in hydraulic turbines is a common problem caused by the high-velocity impact of small particles entrained in the fluid. Numerical simulations can be useful to investigate the effect of each governing parameter in this complex phenomenon. The Finite Volume Particle Method is used to simulate the three-dimensional impact of dozens of rigid spherical particles on a metallic surface. The very fine discretization and the overall number of time steps needed to achieve the steady state erosion rate render the problem very expensive, implying the need for high performance computing. In this talk, a comparison of constitutive models is presented, with the aim of assessing the complexity of the thermomechanical modelling required to accurately simulate the impact and subsequent erosion of metals. The importance of strain rate, triaxiality, friction model and thermal effects is discussed.
Jain Kartik MS Presentation
Thursday, June 9, 2016
Garden 3A, 11:30-11:50

MS Presentation

Assessment of Transitional Hemodynamics in Intracranial Aneurysms at Extreme Scale, Kartik Jain (University of Siegen, Germany)

Co-Authors: Sabine Roller (University of Siegen, Germany); Kent-Andre Mardal (University of Oslo, Norway)

Computational fluid dynamics (CFD) is extensively used for modelling of blood flow in intracranial aneurysms as it can help clinicians in decision for intervention, and may potentially provide information on the pathogenesis of the condition. The flow regime in aneurysms, due to low Reynolds number is mostly presumed laminar - an assumption that was challenged in recent publications that showed high frequency fluctuations in aneurysms resembling transitional flow. The present work aspires to scrutinize the issue of transition in aneurysmal hemodynamics by performing first true direct numerical simulations on aneurysms of various morphologies, with resolutions of the order of Kolmogorov scales, resulting in 1 billion cells. The results show the onset of fluctuations in flow inside aneurysm during deceleration phase of the cardiac cycle before a re-laminarization during acceleration. The fluctuations confine in the aneurysm dome suggesting the manifestation of aneurysm as an initiator of transition to turbulence.
Contributed Talk
Wednesday, June 8, 2016
Garden 3A, 14:30-14:45

Contributed Talk

Direct Numerical Simulation of Transitional Hydrodynamics of the Cerebrospinal Fluid in Chiari I Malformation, Kartik Jain (University of Siegen, Germany)

Co-Authors: Kent-Andre Mardal (University of Oslo, Norway)

Chiari malformation type I is a disorder characterized by the herniation of cerebellar tonsils into the spinal canal resulting in obstruction to cerebrospinal fluid (CSF) outflow. The flow of oscillating CSF is acutely complex due to the anatomy of the subarachnoid space. We report first ever direct numerical simulations on patient specific cases with resolutions that border Kolmogorov scales, amounting to meshes with 2 billion cells and conducted on 50,000 cores of the Hazelhen supercomputer in Stuttgart. Results depict velocity fluctuations of 10kHz, turbulent kinetic energy 2x of mean flow energy in Chiari patients while the flow remains laminar in control subject. The fluctuations confine near craniovertebral junction, and are commensurate with the extremeness of pathology and the extent of herniation. The results advocate that the manifestation of pathological conditions like Chiari malformation may lead to transitional CSF, and a prudent calibration of numerics is necessary to capture such phenomena.
Janett Gioele Poster

Poster

PHY-05 Polarized Radiative Transfer in Discontinuous Media, Gioele Janett (Istituto Ricerche Solari Locarno (IRSOL), Switzerland)

Co-Authors:

Radiation hydrodynamic simulations of stellar atmosphere frequently show shock fronts, contact discontinues, or steep gradients across which radiative transfer needs to be computed in post-processing steps. In view of applications to magnetohydrodynamic models of the solar atmosphere, we have developed a new method for integrating the equation of radiative transfer for polarized light across a discontinuous atmosphere, using piecewise continuous reconstruction and slope limiters. In this poster, we present a comparison of results obtained with our new method and with more conventional methods.
Jenny Patrick MS Presentation
Wednesday, June 8, 2016
Garden 3A, 17:00-17:15

MS Presentation

An Overset Grid Method for Oxygen Transport from Red Blood Cells in Capillary Networks, Patrick Jenny (ETH Zurich, Switzerland)

Co-Authors: Bruno Weber (University of Zurich, Switzerland); Patrick Jenny (ETH Zurich, Switzerland)

Most oxygen in the blood circulation is carried bound to hemoglobin in red blood cells (RBCs). In capillaries, the oxygen partial pressure (PO2) is affected by the individual RBCs that flow in a single file. We have developed a novel overset grid method for oxygen transport from capillaries to tissue. This approach uses moving grids for RBCs and a fixed one for the blood vessels and the tissue. This combination enables accurate modelling of the intravascular PO2 field and the unloading of oxygen from RBCs. Additionally, our model can account for fluctuations in hematocrit and hemoglobin saturation. Its parallel implementation in OpenFOAM supports three-dimensional tortuous capillary networks. Simulations of oxygen transport in the rodent cerebral cortex have been performed and are used to study the cerebral energy metabolism. Other applications include the investigation of hemoglobin saturation heterogeneity in capillary networks.
Jeusel Guillaume Poster

Poster

CSM-11 Porting SPH-Flow to GPUs Using OpenACC: Experience and Challenges, Guillaume Jeusel (Nextflow Software, France)

Co-Authors: Guillaume Jeusel (Nextflow Software, France); Jean-Guillaume Piccinali (ETH Zurich / CSCS, Switzerland); Guillaume Oger (École centrale de Nantes, France)

SPH-flow is the one of the most advanced SPH solvers dedicated to high dynamic multiphase physics simulations. Over the last year, ECNantes has partnered with Nextflow-Software to deliver an accelerated version of the code on Piz Daint. This poster will present the results of this development activity. After assessing the overall performance of the code, we focused on the Monaghan solver. We investigated strategies to improve its performance for an efficient execution on CPUs as well as GPUs, maintaining the scalability of the MPI version and high programmability. The keys to our incremental successes were being able to run a reduced version of the code, data types refactoring and workarounds for the limitations of the compilers. This work should be of interest to academic developers because it details our experience using OpenACC directives for scientific computing in an area of cutting edge research.
Jhon Mark MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:30-15:00

MS Presentation

Deformation and Failure Behavior of Metallic Nanostructures, Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore, Singapore)

Co-Authors: Zhaoxuan Wu (Institute of High Performance Computing, A*STAR, Singapore); Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore); Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore); Wendy Gu (California Institute of Technology, United States of America); Julia Greer (California Institute of Technology, United States of America); David Srolovitz (University of Pennsylvania, United States of America)

The reliability of metallic nanostructures, such as, nanowires and thin films, is often dictated by their mechanical failures. An in-depth understanding in the effects of their intrinsic factors such as grain boundaries and surface roughness and extrinsic factors such as size, shapes and man-made notches on their plastic deformation mechanisms and failure patterns is of great importance in fabricating these nanostructures with high reliability. In this talk, we will first briefly review various deformation mechanisms and failure patterns in literature and highlight some of critical issues that currently are under active research. We will then report our recent progresses in the study of the effects of intrinsic factors, such as grain boundaries and dislocations and extrinsic factors such as sizes, shapes and man-made notches, on the plasticity and failure of metallic nanostructures using both mechanical testing and large-scale molecular dynamics simulations.
Jiménez Javier MS Presentation
Thursday, June 9, 2016
Garden 2A, 12:15-12:30

MS Presentation

A High Resolution Hybrid CUDA-MPI Turbulent Channel Code, Javier Jiménez (Technical University Madrid, Spain)

Co-Authors: Javier Jiménez (Technical University of Madrid, Spain)

A new high order, high resolution hybrid MPI-CUDA code for the simulation of turbulent channel flow on many distributed GPUs is presented. The code benefits from the use of powerful and efficient heterogeneous architectures with GPUs accelerators. Optimization strategies involving the joint use of GPU and CPU lead to excellent performance. Asynchronous GPU-CPU execution achieves almost complete overlap of computations, memory transfer from/to device/host and MPI communications. A considerable speedup is gained with respect to similar synchronous codes. Test cases and performance results show the code is suitable for the next generation of large direct numerical simulations of turbulence.
Jocksch Andreas Poster

Poster

CSM-09 Hash Tables on GPUs Using Lock-Free Linked Lists, Andreas Jocksch (Swiss National Supercomputing Centre, Switzerland)

Co-Authors: Andreas Bleuler (University of Zurich, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Romain Teyssier (University of Zurich, Switzerland)

Hash table implementations which resolve collisions by chaining with linked lists are very flexible with respect to the insertion of additional keys to an existing table and to the deletion of a part of the keys from it. For our implementation on GPUs, we use non-blocking linked lists based on atomic "compare and swap" operations. The deletion of list entries is done by declaring them as invalid and removing them. Typically, after a couple of deletion operations, our local heap is compacted. Using this approach, the initial build of the hash table and hash lookups perform comparably to the CUDPP library implementation. However, small modifications of the table are performed much faster in our implementation than the complete rebuild required by other implementations. We intend to use this novel hash table implementation for astrophysical GPU simulations with adaptive mesh particle-in-cell, which would benefit greatly from these new features.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Andreas Jocksch (Swiss National Supercomputing Centre, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Andreas Jocksch (Swiss National Supercomputing Centre, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.
Jorge Rogério Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Rogério Jorge (Ecole Polytechnique Fédérale de Lausanne (EPFL), Swiss Plasma Center (SPC), CH-1, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Judd Kenneth MS Presentation
Thursday, June 9, 2016
Garden 2A, 14:00-14:30

MS Presentation

Solving Large Systems of Polynomial Equations from Economics on Supercomputers, Kenneth Judd (Stanford University, United States of America)

Co-Authors: Taylor J. Canann (University of Minnesota, United States of America)

Many problems in economics and game theory can be formulated as systems of polynomials. Methods from commutative algebra and numerical algebraic geometry can be used to find all solutions but the computational costs rise significantly as the size increases. Mathematicians have developed methods that exploit parallelism but have deployed them only on small systems. We demonstrate the effectiveness and scalability of these methods on supercomputers, with a focus on solving research problems in economics.
Julien Keith MS Presentation
Thursday, June 9, 2016
Garden 1A, 10:30-11:00

MS Presentation

Towards a Better Understanding of Rapidly Rotating Convection by Combining Direct Numerical Simulations and Asymptotic Modeling, Keith Julien (University of Colorado at Boulder, United States of America)

Co-Authors: Meredith Plumley (University of Colorado at Boulder, United States of America); Keith Julien (University of Colorado at Boulder, United States of America)

Realistic simulations of planetary dynamos will remain impossible in the near future. Especially the enormous range of spatial and temporal scales induced in convective flows by rotation plagues direct numerical simulations (DNS). The same scale disparities that hamper DNS can however be used to derive reduced equations that are expected to govern convection in the limit of rapid rotation. Simulations based on such formulations represent an interesting alternative to DNS. In this talk, recent efforts to test asymptotic models against DNS are reviewed. Results in plane layer geometry reveal convergence of both approaches. Surprisingly, Ekman layers have a profound effect in the rapidly rotating regime and explicitly have to be accounted for in the asymptotic models. Upscale kinetic energy transport leads to the formation of large-scale structures, which may play a prominent role in dynamos. The asymptotic models allow an exploration of parameter regimes far beyond the capabilities of DNS.
Poster

Poster

EAR-04 Implicit Treatment of Inertial Waves in Dynamo Simulations, Keith Julien (University of Colorado at Boulder, United States of America)

Co-Authors: Michael A. Calkins (University of Colorado at Boulder, United States of America); Keith Julien (University of Colorado at Boulder, United States of America)

The explicit treatment of inertial waves imposes a very small timestep in dynamo simulations at low Ekman number. We present a fully spectral Chebyshev tau method that allows us to treat the inertial waves implicitly. The large linear systems that need to be solved at each timestep remain affordable thanks to the sparsity of the formulation. The simulations are parallelised using a 2D data decomposition for the nonlinear calculations combined with a parallel linear solver for the timestepping. Despite the increased complexity, significant gains in wall-clock time are achieved thanks to larger timesteps.
Junge Till MS Presentation
Friday, June 10, 2016
Garden 2A, 09:30-09:55

MS Presentation

Concurrent Coupling of Particles with a Continuum for Dynamical Motion of Solids, Till Junge (kit, Germany)

Co-Authors: J. F. Molinari (EPFL, Switzerland); Till Junge (Karlsruhe Institute of Technology, Germany); Jaehyun Cho (EPFL, Switzerland)

There are many situations where the discrete nature of matter needs to be accounted by numerical models. For instance with crystalline materials, friction and ductile fracture modelling can benefit from Molecular Dynamics formalism. However, capturing these processes needs sizes involving large number of particles, often becoming out of reach of modern computers. Thus, concurrent multiscale approaches have emerged to reduce the computational cost by using a coarser continuum model. The difference between particles and continuum leads to several challenging problems. In this presentation, finite temperatures, numerical stability and dislocation passing will be addressed. Also the software framework LibMultiScale will be presented with its associated parallel computation design choices.

K

Kabadshow Ivo MS Presentation
Thursday, June 9, 2016
Garden 2BC, 11:30-12:00

MS Presentation

How to Do Nothing in Less Time, Ivo Kabadshow (Juelich Supercomputing Centre, Germany)

Co-Authors: David Haensel (Juelich Supercomputing Centre, Germany); Andreas Beckmann (Juelich Supercomputing Centre, Germany)

The Fast Multipole Method is a generic toolbox algorithm for many important scientific applications, e.g. molecular dynamics. It enables us to compute all long-range O(N^2) pairwise interactions for N particles in O(N) for any given precision. Unfortunately, the runtime of such simulations is already communication bound. To increase the performance on modern HPC hardware, a more sophisticated parallelization scheme is required. Especially the reduction of MPI collectives is a vital issue to increase the strong scaling. In this talk we will focus exclusively on the internode communication via MPI. We will present a latency-avoiding communication scheme and its implementation for our C++11 FMM toolbox. The implementation consists of several layers of abstraction to hide/encapsulate low level MPI calls and specifics of the communication algorithm. We will also show examples of the scaling capabilities of the FMM on a BG/Q for small and medium size MD problems.
Kalavsky Peter MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:30-14:00

MS Presentation

Accurate Estimation of 3D Ventricular Activation in Heart Failure Patients from Electroanatomic Mapping, Peter Kalavsky (Università della Svizzera italiana, Switzerland)

Co-Authors: Peter Kalavsky (Università della Svizzera italiana, Switzerland); Mark Potse (Università della Svizzera italiana, Switzerland); Angelo Auricchio (Fondazione Cardiocentro Ticino, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland)

Accurate characterization of the cardiac activation sequence can support diagnosis and personalized therapy in heart-failure patients. Current invasive mapping techniques provide limited coverage. Our aim is to estimate a complete volumetric activation map from a limited number of measurements. This is achieved by optimising the local conductivity and early activation sites to minimise the mismatch between simulated and measured activation times. We modeled the activation times using an eikonal equation, reducing computational cost by 3 orders of magnitude compared to more physiologically detailed methods. The model provided a sufficiently accurate approximation of the activation time and the ECG in our patients. The solver was implemented on GPUs. Since the fast-marching method is not suitable for this architecture, we used a simple Jacobi iteration of a local variational principle. On a single GPU, each forward simulation took less than 2 seconds, and the inverse problem was solved in a few minutes.
Kall Jochen MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:00-13:30

MS Presentation

Junction-Generalized Riemann Problem for Stiff Hyperbolic Balance Laws in Networks of Blood Vessels, Jochen Kall (Technische Universitat Kaiserslautern, Germany)

Co-Authors: Eleuterio F. Toro (University of Trento, Italy); Gino I. Montecinos (Universidad de Chile, Chile); Raul Borsche (Technische Universität Kaiserslautern, Germany); Jochen Kall (Technische Universität Kaiserslautern, Germany)

We design a new implicit solver for the Junction-Generalized Riemann Problem (J-GRP), which is based on a recently proposed implicit method for solving the Generalized Riemann Problem (GRP) for systems of hyperbolic balance laws. We use the new J-GRP solver to construct an ADER scheme that is globally explicit, locally implicit and with no theoretical accuracy barrier, in both space and time. The resulting ADER scheme is able to deal with stiff source terms and can be applied to non-linear systems of hyperbolic balance laws in domains consisting on networks of one-dimensional sub-domains. Here we specifically apply the numerical techniques to networks of blood vessels. An application to a physical test problem consisting of a network of 37 compliant silicon tubes (arteries) and 21 junctions, reveals that it is imperative to use high-order methods at junctions, in order to preserve the desired high-order of accuracy in the full computational domain.
Kammer David MS Presentation
Friday, June 10, 2016
Garden 1A, 09:30-10:00

MS Presentation

Insights from Modeling Frictional Slip Fronts and a Comparison with Experimental Observations, David Kammer (Cornell University, United States of America)

Co-Authors:

Laboratory experiments represent a great opportunity to study fundamental aspects of dynamic slip front propagation in a well-controlled system. Nevertheless, numerical simulations are needed to gain access to information that is experimentally unmeasurable but required to develop full understanding of the underlying mechanics. In this presentation, we discuss simulations that bridge two relevant scales for laboratory stick-slip experiments, which are the macroscopic structure and the meso-scale weakening process occurring at the interface. We show that these high-performance simulations reproduce quantitatively well experimental observations if both length scales are accurately modeled. Furthermore, we apply the acquired insights from the computational model to enable new experimental observations that are otherwise inaccessible, and to develop theoretical tools that are useful for the description and prediction of the mechanics of slip front propagation at frictional interfaces.
Kanewala Thejaka Amila Paper
Thursday, June 9, 2016
Auditorium C, 12:00-12:30

Paper

Context Matters: Distributed Graph Algorithms and Runtime Systems, Thejaka Amila Kanewala (Indiana University, United States of America)

Co-Authors: Jesun Sahariar Firoz (Indiana University, United States of America); Thejaka Amila Kanewala (Indiana University, United States of America); Marcin Zalewski (Indiana University, United States of America); Martina Barnas (Indiana University, United States of America)

The increasing complexity of the software/hardware stack of modern supercomputers makes understanding the performance of the modern massive-scale codes difficult. Distributed graph algorithms (DGAs) are at the forefront of that complexity, pushing the envelope with their massive irregularity and data dependency. We analyse the existing body of research on DGAs to assess how technical contributions are linked to experimental performance results in the field. We distinguish algorithm-level contributions related to graph problems from "runtime-level" concerns related to communication, scheduling, and other low-level features necessary to make distributed algorithms work. We show that the runtime is an integral part of DGAs' experimental results, but it is often ignored by the authors in favor of algorithm-level contributions. We argue that a DGA can only be fully understood as a combination of these two aspects and that detailed reporting of runtime details must become an integral part of scientific standard in the field if results are to be truly understandable and interpretable. Based on our analysis of the field, we provide a template for reporting the runtime details of DGA results, and we further motivate the importance of these details by discussing in detail how seemingly minor runtime changes can make or break a DGA.
Kardos Juraj MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:15-14:30

MS Presentation

Using GridTools Library to Implement Preconditioned Conjugate Gradient Krylov Solver, Juraj Kardos (Università della Svizzera italiana, Switzerland)

Co-Authors:

The need for portable efficient climate applications arise from the demand on precise climate and weather modelling. Model codes are growing in complexity and it is difficult to achieve consensus to deliver both high performance with high programmer productivity. GridTools is a C++ template library for applications on regular grids. The user is required to specify the high-level application model and provide the stencil-operators while the library provides optimised backend for underlying computational hardware. This helps to detach model developer from implementation details. Production process becomes more straightforward with early deployment to HPC clusters. Solution of tridiagonal linear systems, typical for implicit schemes such as advection, diffusion and radiation is abundant and performance critical in climate models. We use GridTools library to implement Preconditioned Conjugate Gradient Krylov solver, iterative method efficient for solving sparse linear systems. We evaluate performance and compare it to other tools such as PETSc.
Poster

Poster

CSM-01 An Interior-Point Stochastic Approximation Method on Massively Parallel Architectures, Juraj Kardos (Università della Svizzera italiana, Switzerland)

Co-Authors: Olaf Schenk (Università della Svizzera italiana, Switzerland); Drosos Kourounis (Università della Svizzera italiana, Switzerland)

The stochastic approximation method is behind the solution of many actively-studied problems in PDE-constrained optimization. Despite its far-reaching applications, there is almost no work on applying stochastic approximation and interior-point optimization, although IPM are particularly efficient in large-scale nonlinear optimization due to their attractive worst-case complexity. We present a massively parallel stochastic IPM method and apply it to stochastic PDE problems such as boundary control and optimization of complex electric power grid systems under uncertainty.
Katsoulakis Markos MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 15:30-16:00

MS Presentation

Approximate Inference Methods and Scalable Uncertainty Quantification for Molecular Systems, Markos Katsoulakis (UMass Amherst, United States of America)

Co-Authors:

We present path-space variational inference methods suitable for coarse-graining of complex non-equilibrium processes, typically associated with coupled physicochemical mechanisms or driven systems. Furthermore, we discuss new, scalable uncertainty quantification information methods that allow to quantify the performance of such approximate inference tools in relation to specific quantities of interest, as well as screen the parametric sensitivity of molecular systems.
Keller Bettina Poster

Poster

LS-06 Structural and Dynamic Properties of Cyclosporin A: Molecular Dynamics and Markov State Modelling, Bettina Keller (Department of Biology, Chemistry, Pharmacy, Freie Universität Berlin, Germany)

Co-Authors: Bettina Keller (Free University of Berlin, Germany); Sereina Z. Riniker (ETH Zurich, Switzerland)

The membrane permeability of cyclic peptides is likely influenced by the conformational behavior of these compounds in polar and apolar environments. The size and complexity of peptides often limits their bioavailability, but there are known examples of peptide natural products such as cyclosporin A (CsA) that can cross cell membranes by passive diffusion. The crystal of CsA structure shows a "closed" conformation with four intramolecular hydrogen bonds. When binding to its target cyclophilin, CsA adopts an "open" conformation without intramolecular hydrogen bonds. In this study, we attempted to sample exhaustively the conformational space of CsA in chloroform and in water by molecular dynamics simulations in order to rationalize the good membrane permeability of CsA observed experimentally. From 10 μs molecular dynamics simulations in each solvent, Markov state models were constructed to characterize the metastable conformational states. The conformational landscapes in both solvents show significant overlap, but also clearly distinct features.
Kelly Paul H. J. MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, Paul H. J. Kelly (Imperial College London, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Kermode James MS Presentation
Friday, June 10, 2016
Garden 2A, 10:40-11:00

MS Presentation

Multiscale Modelling of Materials Failure Processes: Bridging from Atomistic to Continuum Scales, James Kermode (Warwick, United Kingdom)

Co-Authors:

Fracture remains one of the most challenging multi-scale modelling problems, requiring a concurrent description of the chemical processes occurring near a crack tip and the long range stress field driving it forward. This talk will discuss how these requirements can be met simultaneously by combining a quantum mechanical description of the crack tip with a classical atomistic model that captures the elastic behaviour of the surrounding crystal matrix, using a QM/MM (quantum mechanics/molecular mechanics) approach such as the "Learn on the Fly" (LOTF) scheme. Strategies for the extension of this approach to provide effective constitutive laws at the continuum scale will be discussed, together with a recent information-efficient Machine Learning reformulation of the LOTF scheme, and finally if time permits some new work looking at how to couple QM and MM models using automatically derived site energies.
MS Summary

MS Summary

MS10 From Materials' Data to Materials' Insight by Machine Learning, James Kermode (University of Warwick, United Kingdom)

Co-Authors: Alexandre Tkatchenko (Fritz Haber Institute Berlin, Germany), James Kermode (Warwick, United Kingdom)

The rise of high-throughput computational materials design promises to revolutionize the process of discovery of new materials, and tailoring of their properties. At the same time, by generating the structures of hundreds of thousands of hypothetical compounds, the issue of automated processing of large amounts of materials' data has been made very urgent - to identify structure-property relations, rationalize intuitively the behaviour of materials of increasing complexity, and re-use existing information to accelerate the prediction of properties and accelerate the search of materials' space. To address this challenge, a strongly interdisciplinary effort has developed, uniting forces among researchers in applied mathematics, computer science, chemistry and materials science, that aims at adapting machine-learning techniques to the specific problems that are encountered when working with materials. This minisymposium will showcase the most recent developments in this field, and provide a forum for some of the leading figures to discuss the most pressing challenges and the most promising directions. The participants will be selected to represent the many disciplines that are contributing to this endeavour and will cover the following topics: the representation of materials' structures and properties in a synthetic form that is best suited for automated processing, learning of the structure-property relations and circumventing the large computational cost of high-end electronic structure calculations, the identification of outliers and the automatic assessment of the reliability of input data, demonstrative applications to important materials science problems.
Keyes David E. Paper
Wednesday, June 8, 2016
Auditorium C, 16:00-16:30

Paper

On the Robustness and Prospects of Adaptive BDDC Methods for Finite Element Discretizations of Elliptic PDEs with High-Contrast Coefficients, David E. Keyes (KAUST, Saudi Arabia)

Co-Authors: David E. Keyes (King Abdullah University of Science and Technology, Saudi Arabia)

Balancing Domain Decomposition by Constraints (BDDC) methods have proven to be powerful preconditioners for large and sparse linear systems arising from the ﬁnite element discretization of elliptic PDEs. Condition number bounds can be theoretically established that are independent of the number of subdomains of the decomposition.

The core of the methods resides in the design of a larger and partially discontinuous ﬁnite element space that allows for fast application of the preconditioner, where Cholesky factorizations of the subdomain ﬁnite element problems are additively combined with a coarse, global solver. Multilevel and highly-scalable algorithms can be obtained by replacing the coarse Cholesky solver with a coarse BDDC preconditioner.

BDDC methods have the remarkable ability to control the condition number, since the coarse space of the preconditioner can be adaptively enriched at the cost of solving local eigenproblems. The proper identiﬁcation of these eigenproblems extends the robustness of the methods to any heterogeneity in the distribution of the coefficients of the PDEs, not only when the coefficients jumps align with the sub-domain boundaries or when the high contrast regions are conﬁned to lie in the interior of the subdomains. The specific adaptive technique considered in this paper does not depend upon any interaction of discretization and partition; it relies purely on algebraic operations.

Coarse space adaptation in BDDC methods has attractive algorithmic properties, since the technique enhances the concurrency and the arithmetic intensity of the preconditioning step of the sparse implicit solver with the aim of controlling the number of iterations of the Krylov method in a black-box fashion, thus reducing the number of global synchronization steps needed by the iterative solver; data movement and memory bound kernels in the solve phase can be thus limited at the expense of extra local ﬂops during the setup of the preconditioner.

This paper presents an exposition of the BDDC algorithm that identiﬁes the current computational bottlenecks that could prevent it from being competitive with other solvers, and proposes solutions in anticipation of exascale architectures. Furthermore, the discussion aims to give interested practitioners sufficient insights to decide whether or not to pursue BDDC in their applications.

In addition, the work presents novel numerical results using the distributed memory implementation of BDDC in the PETSc library for vector ﬁeld problems arising in the context of porous media ﬂows and electromagnetic modeling; the results provide evidence of the robustness of these methods for highly heterogenous problems and non-conforming discretizations.
Klein Michael L. MS Presentation
Friday, June 10, 2016
Garden 3A, 10:45-11:00

MS Presentation

Simulations of Ion Channel Modulation by Lipids, Michael L. Klein (Temple University, United States of America)

Co-Authors: Michael L. Klein (Temple University, United States of America)

Inward-rectifier K+ (Kir) channels are essential to maintain the resting membrane potential of neurons and to buffer extracellular potassium by glial cells. Indeed, Kir malfunction has been suggested to play a role in some neuropathologies, e.g. white matter disease, epilepsy and Parkinson's disease. Kir activation requires phosphatidylinositol-(4,5)-bisphosphate (PIP2), a highly negatively charged lipid located in the inner membrane leaflet. In addition, the presence of other non-specific, anionic phospholipids, such as phosphatidylserine (PS), is needed for the channel to be responsive at physiological concentrations of PIP2. The dual lipid modulation of Kir channels is not yet fully understood at a molecular level. To give further insight into how the two lipids act cooperatively to open the channel, we used all-atom molecular dynamics (MD) simulations. We found that initial binding of PS helps to pre-assemble the binding site of PIP2, which in turn completes Kir activation.
Knepley Matthew G. Paper
Wednesday, June 8, 2016
Auditorium C, 15:30-16:00

Paper

Extreme-Scale Multigrid Components within PETSc, Matthew G. Knepley (Rice University, United States of America)

Co-Authors: Dave A. May (ETH Zurich, Switzerland); Karl Rupp (Austria); Matthew G. Knepley (Rice University, United States of America); Barry F. Smith (Argonne National Laboratory, United States of America)

Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for highresolution simulations. The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely affected by using a coarse-level solver with sub-optimal algorithmic complexity. To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary.

In this work, we present a new software component introduced within the Portable Extensible Toolkit for Scientific computation (PETSc) which permits agglomeration. We provide an overview of the design and implementation of this functionality, together with several use cases highlighting the benefits of agglomeration. Lastly, we demonstrate via numerical experiments employing geometric multigrid with structured meshes, the flexibility and performance gains possible using our MPI-rank agglomeration implementation.
Kormann Katharina MS Presentation
Wednesday, June 8, 2016
Garden 3C, 17:00-17:15

MS Presentation

Parallelization Strategies for a Semi-Lagrangian Vlasov Code, Katharina Kormann (Max-Planck-Institut für Plasmaphysik, Germany)

Co-Authors: Klaus Reuter (Max Planck Computing and Data Facility, Germany); Eric Sonnendrücker (Max Planck Society, Germany)

Grid-based solvers for the Vlasov equation give accurate results but suffer from the curse of dimensionality. To enable the grid-based solution of the Vlasov equation in 6d phase-space, we need efficient parallelization schemes. In this talk, we consider the 6d Vlasov-Poisson problem discretized by a split-step semi-Lagrangian scheme. This method works with successive 1d interpolations on 1d stripes of the 6d domain. We consider two parallelization strategies: A remapping strategy that works with two different layouts keeping parts of the dimensions sequential and a classical partitioning into hyper-rectangles. The 1d interpolations can be performed sequentially on each processor for the remapping scheme. On the other hand, the remapping consists in an all-to-all communication pattern. The partitioning only requires localized communication but each 1d interpolation needs to be performed on distributed data. We compare both parallelization schemes and discuss how to efficiently handle the domain boundaries in the interpolation for partitioning.
MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:30-16:45

MS Presentation

Particle in Fourier Discretization of Kinetic Equations, Katharina Kormann (Max-Planck-Institut für Plasmaphysik, Germany)

Co-Authors: Katharina Kormann (Max Planck Institute for Plasma Physics, Germany); Eric Sonnendrücker (Max Planck Society, Germany)

Particle methods are very popular for the discretization of kinetic equations, since they are embarrassingly parallel. In plasma physics the high dimensionality (6D) of the problems raises the costs of grid based codes, favouring the mesh free transport with particles. A standard Particle in Cell (PIC) scheme couples the particle density to a grid based field solver using finite elements. In this particle mesh coupling the stochastic error appears as noise, while the deterministic error leads to e.g. aliasing, inducing unphysical instabilities. Projecting the particles onto a spectral grid yields an energy and momentum conserving, almost sure aliasing free scheme, Particle in Fourier (PIF). For few electrostatic modes PIF has very little computational overhead, rendering it suitable for a fast implementation. We present 6D Vlasov-Poisson simulations of Landau damping and a Bump-on-Tail instability and compare the results as well as the computational performance to a grid based semi-Lagrangian solver.
MS Summary

MS Summary

MS11 HPC Implementations and Numerics for Kinetic Plasma Models, Katharina Kormann (Max-Planck-Institut für Plasmaphysik, Germany)

Co-Authors: Jakob Ameres (Technische Universität München, Germany)

The fundamental model in plasma physics is a kinetic description by a phase-space distribution function solving the Vlasov-Maxwell equation. Due to the complexity of the models, computer simulations are of key importance in understanding the behaviour of plasmas e.g. in fusion reactors and a wide range of codes exists in the plasma physics community and are run on large-scale computing facilities. However, kinetic simulations are very challenging due to the relatively high-dimensionality, the presence of multiple scales, and turbulences. For this reason, state-of-the-art plasma solvers mostly discretize simplified models like the gyrokinetic equations. Recent advances in computing power render it possible to approach the full six-dimensional system. The focus of the minisymposium is to bring together researchers developing modern numerical methods and optimized implementation of scalable algorithms for a future generation of plasma codes capable of simulating new physical aspects. Two types of methods are used in state-of-the-art Vlasov solvers: particle-based and grid-based methods. Especially for high-dimensional models, particle-based methods are many times preferred due to a better scaling with dimensionality. Even though particle-in-cell codes are embarrassingly parallel, care has to be taken in the layout of the memory structure in order to enable fast memory access on high-performance computers. On the other hand, grid-based methods are known to give accurate results for reduced Vlasov equations in two and four dimensional phase space. Domain partitioning strategies and scalable interpolation algorithms for semi-Lagrangian methods need to be developed. Mesh refinement can also be used to reduce the number of grid points. Macro-scale properties of the plasma can often be described by a fluid model. Spectral discretization methods have the attractive feature that they reduce the kinetic model to a number of moments - thus incorporating a fluid description of plasmas. A central aspect of this minisymposium will be the simulation of highly magnetized plasmas. This is the situation for fusion devices based on magnetic confinement fusion, like the ITER project. In this configuration, the particle exhibit a fast circular motion around the magnetic field lines, the so-called gyromotion. This motion gives rise to multiple scales since turbulences arise on a much slower time scale. Asymptotically preserving schemes can tackle the time scale of the gyromotion beyond the gyrokinetic model.
Koumoutsakos Petros Contributed Talk
Thursday, June 9, 2016
Garden 3A, 10:50-11:10

Contributed Talk

Propulsive Advantage of Swimming in Unsteady Flows, Petros Koumoutsakos (ETH Zurich, Switzerland)

Co-Authors: Siddhartha Verma (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

Individual fish swimming in a school encounter vortices generated by the propulsion of upstream members. Experimental and theoretical studies suggest that these hydrodynamic interactions may increase thrust without additional energy expenditure. However, difficulties associated with experimental studies have prevented a systematic quantification of this phenomenon. Using simulations of self-propelled swimmers, we investigate some of the mechanisms by which fish may exploit each others' wake to reduce energy expenditure. We quantify the relative importance of two mechanisms for increasing swimming efficiency: the decrease in relative velocity induced by proximity to wake vortices; and wall/"channelling" effects. Additionally, we conduct simulations of fish swimming in the Karman vortex street behind a static cylinder. This configuration helps us clarify the role of the bow pressure wave, entrainment, and "vortex-surfing" in enhancing propulsive efficiency of trout swimming near obstacles.
Paper
Wednesday, June 8, 2016
Auditorium C, 14:30-15:00

Paper

Approximate Bayesian Computation for Granular and Molecular Dynamics Simulations, Petros Koumoutsakos (ETH Zurich, Switzerland)

Co-Authors: Panagiotis Angelikopoulos (ETH Zurich, Switzerland); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Costas Papadimitriou (University of Thessaly, Greece); Petros Koumoutsakos (ETH Zurich, Switzerland)

The effective integration of models with data through Bayesian uncertainty quantification hinges on the formulation of a suitable likelihood function. In many cases such a likelihood may not be readily available or it may be difficult to compute. The Approximate Bayesian Computation (ABC) proposes the formulation of a likelihood function through the comparison between low dimensional summary statistics of the model predictions and corresponding statistics on the data. In this work we report a computationally efficient approach to the Bayesian updating of Molecular Dynamics (MD) models through ABC using a variant of the Subset Simulation method. We demonstrate that ABC can also be used for Bayesian updating of models with an explicitly defined likelihood function, and compare ABC-SubSim implementation and efficiency with the transitional Markov chain Monte Carlo (TMCMC). ABC-SubSim is then used in force-field identification of MD simulations. Furthermore, we examine the concept of relative entropy minimization for the calibration of force fields and exploit it within ABC. Using different approximate posterior formulations, we showcase that assuming Gaussian ensemble fluctuations of molecular systems quantities of interest can potentially lead to erroneous parameter identification.
Paper
Wednesday, June 8, 2016
Auditorium C, 17:00-17:30

Paper

An Efficient Compressible Multicomponent Flow Solver for Heterogeneous CPU/GPU Architectures, Petros Koumoutsakos (ETH Zurich, Switzerland)

Co-Authors: Babak Hejazialhosseini (Cascade Technologies Inc., United States of America); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Diego Rossinelli (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

We present a solver for three-dimensional compressible multicomponent flow based on the compressible Euler equations. The solver is based on a finite volume scheme for structured grids and advances the solution using an explicit Runge-Kutta time stepper. The numerical scheme requires the computation of the flux divergence based on an approximate Riemann problem. The computation of the divergence quantity is the most expensive task in the algorithm. Our implementation takes advantage of the compute capabilities of heterogeneous CPU/GPU architectures. The computational problem is organized in subdomains small enough to be placed into the GPU memory. The compute intensive stencil scheme is offloaded to the GPU accelerator while advancing the solution in time on the CPU. Our method to implement the stencil scheme on the GPU is not limited to applications in fluid dynamics. The performance of our solver was assessed on Piz Daint, a XC30 supercomputer at CSCS. The GPU code is memory-bound and achieves a per-node performance of 462 Gflop/s, outperforming by 3.2x the multicore-based Gordon Bell winning CUBISM-MPCF solver for the offloaded computation on the same platform. The focus of this work is on the per-node performance of the heterogeneous solver. In addition, we examine the performance of the solver across 4096 compute nodes. We present simulations for the shock-induced collapse of an aligned row of air bubbles submerged in water using 4 billion cells. Results show a final pressure amplification that is 100x stronger than the strength of the initial shock.
Kourounis Drosos Poster

Poster

CSM-01 An Interior-Point Stochastic Approximation Method on Massively Parallel Architectures, Drosos Kourounis (USI, Switzerland)

Co-Authors: Olaf Schenk (Università della Svizzera italiana, Switzerland); Drosos Kourounis (Università della Svizzera italiana, Switzerland)

The stochastic approximation method is behind the solution of many actively-studied problems in PDE-constrained optimization. Despite its far-reaching applications, there is almost no work on applying stochastic approximation and interior-point optimization, although IPM are particularly efficient in large-scale nonlinear optimization due to their attractive worst-case complexity. We present a massively parallel stochastic IPM method and apply it to stochastic PDE problems such as boundary control and optimization of complex electric power grid systems under uncertainty.
Poster

Poster

CSM-07 Estimation of Drag and Lift Coefficients for Steady State Incompressible Flow of a Newtonian Fluid on Domains with Periodic Roughness, Drosos Kourounis (USI, Switzerland)

Co-Authors: Drosos Kourounis (Università della Svizzera italiana, Switzerland); Olaf Schenk (Università della Svizzera italiana, Switzerland)

Rough boundaries impose several challenges to fluid simulations. Their difficulty stems from the fact that resolving the small scale rough geometry requires significantly refined meshes in the vicinity of the boundaries. Since all physical rough boundaries have a characteristic length, scale corrections on the standard Navier-Stokes equations can be obtained by considering Taylor expansions around the rough surface, leading to modified boundary conditions. Numerical tests are presented to validate the proposed theory including the calculation of drag and lift coefficients for laminar flow around a cylinder with rough boundary. Key-words: steady state Navier-Stokes equations, periodic rough boundaries, drag and lift coefficients, laminar flow.
Kozhevnikov Anton MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 13:20-13:40

MS Presentation

Domain Specific Libraries for Material Science Applications, Anton Kozhevnikov (CSCS, Switzerland)

Co-Authors:

In this talk I would like to discuss the material science software co-development from the perspective of a supercomputing centre.
Poster

Poster

CSM-12 Scalable Implementation of the FFT Kernel for Plane-Wave Codes, Anton Kozhevnikov (CSCS, Switzerland)

Co-Authors:

The application of the local part of the effective potential to the wave-functions is one of the known bottlenecks of the plane-wave pseudopotential codes. The reason for this is the underlying FFT kernel which is traditionally parallelized over the maximum available number of MPI ranks. Alternatively, the FFT kernel can be implemented on a two-dimensional MPI grid where row ranks take care of individual FFTs and column ranks are used for band parallelization. The 2D MPI grid approach to the FFT kernel and its performance are discussed in this work.
Kozubek Tomá Paper
Wednesday, June 8, 2016
Auditorium C, 16:30-17:00

Paper

Massively Parallel Hybrid Total FETI (HTFETI) Solver, Tomá Kozubek (IT4Innovations National Supercomputing Center, Ostrava, Czech Republic)

Co-Authors: Tomáš Brzobohatý (IT4Innovations National Supercomputing Center, Czech Republic); Alexandros Markopoulos (IT4Innovations National Supercomputing Center, Czech Republic); Ondřej Meca (IT4Innovations National Supercomputing Center, Czech Republic); Tomáš Kozubek (IT4Innovations National Supercomputing Center, Czech Republic)

This paper describes the Hybrid Total FETI (HTFETI) method and its parallel implementation in the ESPRESO library. HTFETI is a variant of the FETI type domain decomposition method in which a small number of neighboring subdomains is aggregated into clusters. This can be also viewed as a multilevel decomposition approach which results into a smaller coarse problem - the main scalability bottleneck of the FETI and FETI-DP methods.

The efficiency of our implementation which employs hybrid parallelization in the form of MPI and Cilk++ is evaluated using both weak and strong scalability tests. The weak scalability of the solver is shown on the 3 dimensional linear elasticity problem of a size up to 30 billion of Degrees Of Freedom (DOF) executed on 4096 compute nodes. The strong scalability is evaluated on the problem of size 2.6 billion DOF scaled from 1000 to 4913 compute nodes. The results show the super-linear scaling of the single iteration time and linear scalability of the solver runtime. The latter combines both numerical and parallel scalability and shows overall HTFETI solver performance. The large scale tests use our own parallel synthetics benchmark generator that is also described in the paper.

The last set of results shows that HTFETI is very efficient for problems of size up 1.7 billion DOF and provide better time to solution when compared to TFETI method.
Krause Rolf MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:00-14:15

MS Presentation

FD/FEM Coupling with the Immersed Boundary Method for the Simulation of Aortic Heart Valves, Rolf Krause (Università della Svizzera italiana, Switzerland)

Co-Authors: Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Hadi Zolfaghari (University of Bern, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

The ever-increasing available computational power allows for solving more complex physical problems spanning multiple physical domains. We present a numerical tool for simulating fluid-structure interaction between blood flow and the soft tissue of heart valves. Using the basic concept of the Immersed Boundary Method, the interaction between the two physical domains (flow and structure) does not require mesh manipulation. We solve the governing equations of the fluid and the structure with domain-specific finite difference and finite element discretisations, respectively. We use a massively parallel algorithmic framework for handling the L2-projection transfer between the loosely coupled highly parallel solvers for fluid and solid. Our tool builds on a well-established and proven Navier-Stokes solver and a novel method for solving non-linear continuum solid mechanics.
MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:30-14:00

MS Presentation

Accurate Estimation of 3D Ventricular Activation in Heart Failure Patients from Electroanatomic Mapping, Rolf Krause (Università della Svizzera italiana, Switzerland)

Co-Authors: Peter Kalavsky (Università della Svizzera italiana, Switzerland); Mark Potse (Università della Svizzera italiana, Switzerland); Angelo Auricchio (Fondazione Cardiocentro Ticino, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland)

Accurate characterization of the cardiac activation sequence can support diagnosis and personalized therapy in heart-failure patients. Current invasive mapping techniques provide limited coverage. Our aim is to estimate a complete volumetric activation map from a limited number of measurements. This is achieved by optimising the local conductivity and early activation sites to minimise the mismatch between simulated and measured activation times. We modeled the activation times using an eikonal equation, reducing computational cost by 3 orders of magnitude compared to more physiologically detailed methods. The model provided a sufficiently accurate approximation of the activation time and the ECG in our patients. The solver was implemented on GPUs. Since the fast-marching method is not suitable for this architecture, we used a simple Jacobi iteration of a local variational principle. On a single GPU, each forward simulation took less than 2 seconds, and the inverse problem was solved in a few minutes.
Poster

Poster

LS-03 GPU-Accelerated Immersed Boundary Method with CUDA for the Efficient Simulation of Biomedical Fluid-Structure Interaction, Rolf Krause (Università della Svizzera italiana, Switzerland)

Co-Authors: Barna Errol Mario Becsek (University of Bern, Switzerland); Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

Immersed boundary methods have become the most usable and useful tools for simulation of biomedical fluid-structure interaction, e.g., in the aortic valve of human heart. In such problems, complex geometry and motion of the soft tissue impose significant computational cost for bodyfitted-mesh methods. Resorting to a fixed Eulerian grid for the flow simulation along with the immersed boundary method to model the interaction with the soft tissue eliminates the expensive mesh generation and updating costs. Nevertheless, the computational cost for the geometry operations including adaptive search algorithms are still significant. Herein, we implemented the immersed boundary kernels with CUDA to be transferred and executed on thousands of parallel threads on the general purpose GPU. Host-device memory optimisation along with optimal usage of GPU multiprocessors results in a boosted performance in fluid-structure interaction simulations.
Poster

Poster

EMD-03 Parallel MCMC for Estimating Exponential Random Graph Models, Rolf Krause (Università della Svizzera italiana, Switzerland)

Co-Authors: Alex Stivala (University of Melbourne, Australia); Antonietta Mira (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Garry Robins (University of Melbourne, Australia); Alessandro Lomi (Università della Svizzera italiana, Switzerland)

As information and communication technologies continue to expand, the need arises to develop analytical strategies capable of accommodating new and larger sets of social network data. Considerable attention has recently been dedicated to the possibility of scaling exponential random graph models (ERGMs) - a well-established family of statistical models - for analyzing large social networks. Efficient computational methods would be highly desirable in order to extend the empirical scope of ERGM for the analysis of large social networks. We report preliminary results of a research project on the development of new sampling methods for ERGMs. We propose a new MCMC sampler and use it with Metropolis coupled Markov chain Monte Carlo, a typical scheme for MCMC parallelization. We show that, using this method, the CPU time for parameter estimation may be considerably reduced. *Generous support from the Swiss National Platform of Advanced Scientific Computing (PASC) is gratefully acknowledged.
Kreutzer Moritz MS Presentation
Thursday, June 9, 2016
Auditorium C, 15:00-15:30

MS Presentation

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems, Moritz Kreutzer (Friedrich-Alexander University of Erlangen-Nuremberg, Germany)

Co-Authors: Georg Hager (University of Erlangen-Nuremberg, Germany); Gerhard Wellein (University of Erlangen-Nuremberg, Germany)

A significant amount of future exascale-class high performance computer systems are projected to be of heterogeneous nature, featuring "standard" as well as "accelerated" resources. A software infrastructure that claims applicability for such systems must be able to meet their inherent challenges: multiple levels of parallelism, complex topologies, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is an open-source library of building blocks for sparse linear algebra algorithms on current and future large-scale systems. Being built on the "MPI+X" paradigm, it provides truly heterogeneous data parallelism and a light-weight and affinity-aware tasking mechanism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. Important design decisions are described with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models.
Krischer Lion MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:45-15:00

MS Presentation

Salvus: A Flexible Open-Source Package for Full-Waveform Modelling and Inversion, Lion Krischer (LMU Munich, Germany)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Martin van Driel (ETH Zurich, Switzerland); Lion Krischer (Ludwig Maximilian University of Munich, Germany); Dave A. May (ETH Zurich, Switzerland); Max Rietmann (ETH Zurich, Switzerland); Korbinian Sager (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Within all domain-specific software projects, finding the correct balance between flexibility and performance is often difficult. In the seismic imaging community, the trend has been to move towards codes which are heavily optimised, but which often sacrifice usability and flexibility. Here we introduce Salvus: an open-source HPC high-order finite element (FE) package focused on full-waveform modelling and inversion, which is designed to be both flexible and performant. Salvus was constructed by following modern software design practices, testing protocols, and by establishing its foundations upon existing open-source high-level scientific libraries. The FE framework is generalized over spatial dimensions, time-integrators, polynomial order and wave-propagation physics, and provides support for both hexahedral and tetrahedral meshes. Additionally, support is included for various numerical optimisation methods. We discuss our usage of existing open-source scientific libraries, our choice level of abstraction, and quantitatively investigate the performance penalties associated with these abstractions.
Kronbichler Martin MS Presentation
Wednesday, June 8, 2016
Garden 3A, 15:30-16:00

MS Presentation

Hybridizable Discontinuous Galerkin Approximation of Cardiac Electrophysiology, Martin Kronbichler (Institute for Computational Mechanics, Technical University of Munich, Germany)

Co-Authors: Cristóbal Bertoglio (Center for Mathematical Modeling, University of Chile, Chile); Martin Kronbichler (Technical University of Munich, Germany); Wolfgang A. Wall (Technical University of Munich, Germany)

Cardiac electrophysiology simulations are numerically extremely challenging, due to the propagation of the very steep electrochemical wave front during depolarization. Hence, in classical continuous Galerkin (CG) approaches, very small temporal and spacial discretisations are necessary to obtain physiological propagation. Until now, spatial discretisations based on discontinuous methods have received little attention for cardiac electrophysiology simulations. In particular, local discontinuous Galerkin (LDG) or hybridizable discontinuous Galerkin (HDG) methods have not been explored yet. Application of such methods, when taking advantage of their parallelity features, would allow a speed-up of the computations. In this work we provide a detailed comparison among CG, LDG and HDG methods for electrophysiology equations based on the mono-domain model. We also study the effect of the numerical integration of the non-linear ionic current term. Furthermore we plan to show the difference between classic CG methods and HDG methods on large three-dimensional simulations with patient-specific cardiac geometries.
Krstic Marinkovic Marina Poster

Poster

PHY-04 Platform Independent Profiling of a QCD Code, Marina Krstic Marinkovic (CERN, Switzerland)

Co-Authors: Luka Stanisic (Centre de Recherche Inria Bordeaux - Sud-Ouest, France)

The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution is to use coarse grain simulations of the application that can assist in detecting performance bottlenecks. We present a procedure of implementing the intermediate profiling for openQCD code[1] that will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator[2], which allows for fast and accurate performance predictions of the codes on HPC architectures. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated. [1] http://luscher.web.cern.ch/luscher/openQCD/ [2] http://simgrid.gforge.inria.fr/
Kruzik Jakub Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:30-11:50

Contributed Talk

The Energy Consumption Optimization of the FETI Solver, Jakub Kruzik (IT4Innovations National Supercomputing Center, VSB-Technical University of Ostra, Czech Republic)

Co-Authors: Lubomir Riha (IT4Innovations National Supercomputing Center, Czech Republic); Radim Sojka (IT4Innovations National Supercomputing Center, Czech Republic); Jakub Kruzik (IT4Innovations National Supercomputing Center, Czech Republic); Martin Beseda (IT4Innovations National Supercomputing Center, Czech Republic)

The presentation deals with the energy consumption evaluation of the FETI method blending iterative and direct solvers in the scope of READEX project. The measured characteristics on model cube benchmark illustrate the behaviour of preprocessing and solve phases related mainly to the CPU frequency, different problem decompositions, compiler's type and compiler's parameters. In preprocessing it is necessary to factorize the stiffness and coarse problem matrices, which belongs to the most time and also energy consuming operations. The solve employs the conjugate gradient algorithm and consists of sparse matrix-vector multiplications and vector dot products or AXPY functions. In each iteration we need to apply direct solver twice for pseudo-inverse action and coarse problem solution. All these operations cover together the basic Sparse and Dense BLAS Level 1, 2 and 3 routines, so that we can explore their different dynamism and dynamic switching between various configurations can then provide significant energy savings.
Kubler Felix MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:00-15:15

MS Presentation

Computing Equilibria in Dynamic Stochastic Macro-Models with Heterogeneous Agents, Felix Kubler (University of Zurich, Switzerland)

Co-Authors: Felix Kubler (University of Zurich, Switzerland); Simon Scheidegger (University of Zurich & Stanford University, Switzerland)

We show how sparse grid interpolation methods in conjunction with parallel computing can be used to approximate equilibria in overlapping generation (OLG) models with aggregate uncertainty. In such models, the state of the economy can be characterized by the wealth distribution across generations/cohorts of the population. To approximate the function mapping this state into agents' investment decisions and market prices, we use piece-wise multi-linear hierarchical basis functions on (adaptive) sparse grids. When solving for the recursive equilibrium function, we combine the adaptive sparse grid with a time iteration procedure resulting in an algorithm that is massively parallelisable. Our implementation is hybrid-parallel and can solve OLG models with large (depreciation) shocks and with 60 continuous state variables.
Kuckuk Sebastian Poster

Poster

CSM-04 Automatic Code Generation for Simulations of Non-Newtonian Fluids, Sebastian Kuckuk (Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany)

Co-Authors: Harald Köstler (University of Erlangen-Nuremberg, Germany)

The simulation of fluids exhibiting non-isothermal and non-Newtonian behavior is of great relevance to a multitude of industrial applications. However, setting up discretization and solver components as well as parallelizing and optimizing involved code is quite challenging. One possible remedy is using Domain-Specific Languages (DSLs) in conjunction with code generation techniques, as pursued by project ExaStencils. In this work, we present our advances in fully generating solvers for such fluid flow problems from an abstract representation. In detail, we examine a finite volume discretization of Navier-Stokes and temperature equations on a non-uniform staggered grid which is treated using the SIMPLE algorithm and geometric multigrid solvers. This complex application brings new challenges, mainly considering automatically applied optimizations, but also many possibilities to enhance employed DSLs and, thus, greatly facilitate design and implementation processes. We discuss these points in depth and provide convincing performance results for fully generated and automatically parallelized solvers.
Kulakova Lina Paper
Wednesday, June 8, 2016
Auditorium C, 14:30-15:00

Paper

Approximate Bayesian Computation for Granular and Molecular Dynamics Simulations, Lina Kulakova (ETH Zurich, Switzerland)

Co-Authors: Panagiotis Angelikopoulos (ETH Zurich, Switzerland); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Costas Papadimitriou (University of Thessaly, Greece); Petros Koumoutsakos (ETH Zurich, Switzerland)

The effective integration of models with data through Bayesian uncertainty quantification hinges on the formulation of a suitable likelihood function. In many cases such a likelihood may not be readily available or it may be difficult to compute. The Approximate Bayesian Computation (ABC) proposes the formulation of a likelihood function through the comparison between low dimensional summary statistics of the model predictions and corresponding statistics on the data. In this work we report a computationally efficient approach to the Bayesian updating of Molecular Dynamics (MD) models through ABC using a variant of the Subset Simulation method. We demonstrate that ABC can also be used for Bayesian updating of models with an explicitly defined likelihood function, and compare ABC-SubSim implementation and efficiency with the transitional Markov chain Monte Carlo (TMCMC). ABC-SubSim is then used in force-field identification of MD simulations. Furthermore, we examine the concept of relative entropy minimization for the calibration of force fields and exploit it within ABC. Using different approximate posterior formulations, we showcase that assuming Gaussian ensemble fluctuations of molecular systems quantities of interest can potentially lead to erroneous parameter identification.
Kummer Thomas MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:00-16:30

MS Presentation

Fluid Structure Interaction Model for Heart Assist Device Optimization Studies, Thomas Kummer (ETH Zurich, Switzerland)

Co-Authors:

Designing and optimising heart assist devices is complicated by a lack of parameter data and limited experimental capabilities. Thus, computational models can step in to gain a better understanding of the fluid and structure mechanics of the heart. We have developed a structural model of the heart muscle, which is coupled to a lumped parameter blood circulation. Doing this, we only focus on relevant processes for the design of heart assist devices acting on the heart's outer surface and therefore reduce computational effort. In a first step we want to show the influence of deploying force on the pericardium and to what extent the heart's contraction can be supported. In a second step force patches will be optimised toward objectives like cardiac output, contraction pattern or stress inside the heart muscle. The knowledge of how to distribute force on the heart's surface will be fundamental to the assist device design.
Kunz Martin Contributed Talk
Thursday, June 9, 2016
Garden 3C, 15:25-15:45

Contributed Talk

Gevolution: A Cosmological N-Body Code Based on General Relativity, Martin Kunz (University of Geneva, Switzerland)

Co-Authors: Martin Kunz (University of Geneva, Switzerland)

Cosmological structure formation is a highly non-linear process that can only be studied with the help of numerical simulations. This process is mainly governed by gravity, which is the dominant force on large scales. A century after the formulation of general relativity, numerical codes for structure formation still use Newton's law of gravitation. In my talk I will present results from the first simulations of cosmic structure formation using equations consistently derived from general relativity. Our particle-mesh N-body code gevolution computes all six degrees of freedom of the metric and consistently solves the geodesic equation for particles, taking into account the relativistic potentials and the frame-dragging force. Thanks to this, we were able to study in detail for a standard ΛCDM cosmology the small relativistic effects that cannot be obtained within a purely Newtonian framework.
Kuppuudaiyar Perumal Poster

Poster

CSM-05 CloudLightning: Self-Organising, Self-Managing Heterogeneous Cloud, Perumal Kuppuudaiyar (Intel, Ireland)

Co-Authors: Anna Gourinovitch (Dublin City University, Ireland)

CloudLightning is funded under the European Union's Horizon 2020 research and innovation programme under the call H2020-ICT-2014-1. It comprises eight partners from academia and industry and is coordinated by University College Cork. The objective of the project is to create a new way of provisioning heterogeneous cloud resources to deliver cloud services on the principles of self-management and self-organisation. This new self-organising system will make the cloud more accessible to cloud consumers and provide cloud service providers with power-efficient, scalable management of their cloud infrastructures. The CloudLightning solution will be demonstrated in three application domains: i) Genome Processing; ii) Oil and Gas exploration; and iii) Ray Tracing. Expected impacts for European cloud service providers include increased competitiveness through reduced cost and differentiation; increased energy efficiency and reduced environmental impact; improved service delivery; and greater accessibility to cloud computing for high performance computing workloads.
Köstler Harald Poster

Poster

CSM-04 Automatic Code Generation for Simulations of Non-Newtonian Fluids, Harald Köstler (University of Erlangen-Nuremberg, Germany)

Co-Authors: Harald Köstler (University of Erlangen-Nuremberg, Germany)

The simulation of fluids exhibiting non-isothermal and non-Newtonian behavior is of great relevance to a multitude of industrial applications. However, setting up discretization and solver components as well as parallelizing and optimizing involved code is quite challenging. One possible remedy is using Domain-Specific Languages (DSLs) in conjunction with code generation techniques, as pursued by project ExaStencils. In this work, we present our advances in fully generating solvers for such fluid flow problems from an abstract representation. In detail, we examine a finite volume discretization of Navier-Stokes and temperature equations on a non-uniform staggered grid which is treated using the SIMPLE algorithm and geometric multigrid solvers. This complex application brings new challenges, mainly considering automatically applied optimizations, but also many possibilities to enhance employed DSLs and, thus, greatly facilitate design and implementation processes. We discuss these points in depth and provide convincing performance results for fully generated and automatically parallelized solvers.
MS Summary

MS Summary

MS13 Development, Adaption, and Implementation of Numerical Methods for Exascale, Harald Köstler (University of Erlangen-Nuremberg, Germany)

Co-Authors: Harald Köstler (University of Erlangen-Nuremberg, Germany)

Nowadays supercomputers with millions of cores, specialized communication networks and in many cases accelerators pose big challenges for the development of scientific applications. In most cases, the efficiency of these applications largely dominated by one or a few numerical methods. As a consequence these underlying methods have to be able to use the available machine efficiently. The Priority Programme 1648 SPPEXA funded by the German Research Foundation (DFG) aims at tackling software challenges that arise on the way to exascale. One important part of these challenges are extremely scalable methods. SPPEXA's first year of its second three-year funding period started in January. In this minisymposium a subset of the project specifically addressing the challenges in numerical methods will present their approaches towards exascale numerics. The topics include fault tolerance, latency avoiding and automatic code generation, as well as strategies to achieve the extreme scalability that will be needed in the future.

L

Lanti Emmanuel Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Emmanuel Lanti (EPFL/SPC, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Emmanuel Lanti (EPFL/SPC, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.
Lartigue Ghislain MS Presentation
Wednesday, June 8, 2016
Garden 2A, 13:30-14:00

MS Presentation

High-Performance Computing for Large-Scale Unsteady Simulations of Turbulent Reacting Multi-Phase Flows: Challenges and Perspectives, Ghislain Lartigue (CORIA, CNRS UMR6614, France)

Co-Authors: Ghislain Lartigue (CORIA, CNRS UMR6614, France)

The prediction of conversion efficiency and pollutant emissions in combustion devices is particularly challenging as they result from very complex interactions of turbulence, chemistry, and heat exchanges at very different space and time scales. In the recent years, Large-Eddy Simulation (LES) has proven to bring significant improvements in the prediction of reacting turbulent flows. The CORIA lab leads the development of the YALES2 solver, which is designed to model turbulent reactive two-phase flows with body-fitted unstructured meshes. It has been specifically tailored for dealing with very large meshes up to tens of billion cells and for solving efficiently the low-Mach number Navier-Stokes equations on massively parallel computers. The presentation will focus on the high-fidelity combustion LES and the analysis of the huge amount of data generated by these simulations. Numerical methods used to decouple the different time-scales and to optimise the mesh resolution will also be emphasized.
Latt Jonas MS Presentation
Friday, June 10, 2016
Garden 3C, 09:00-09:30

MS Presentation

Numerical Simulation of Falling Non-Spherical Particles with Air Resistance, Jonas Latt (University of Geneva, Switzerland)

Co-Authors:

Numerical simulation of particle-fluid interaction finds its importance in a wide range of applications like erosion and sedimentation, industrial filtering processes, or the study of volcanic plumes. As far as numerical simulation is concerned, the literature in this field is restricted to cases of low or moderate Reynolds numbers, or to examples in which the motion of the solid body is restricted or completely inhibited (example: the numerical simulation of particle settling at low Reynolds number, dominated by the particle-ground interaction). This presentation introduces a new numerical method which overcomes these limitations and resolves the coupled fluid-solid motion in a higher-Reynolds regime. It focuses on the free fall, in air, of particles with a diameter of several millimeters. A major difficulty stems from the fact that the particles typically reach a terminal velocity around 20 m/s and a Reynolds number around 50,000, at which fluid turbulence is fully developed.
Latu Guillaume Paper
Thursday, June 9, 2016
Auditorium C, 11:00-11:30

Paper

Benefits of SMT and of Parallel Transpose Algorithm for the Large-Scale GYSELA Application, Guillaume Latu (CEA, France)

Co-Authors: Julien Bigot (CEA, France); Nicolas Bouzat (INRIA, France); Judit Gimenez (BSC, Spain); Virginie Grandgirard (CEA, France)

This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The semi-Lagrangian code Gysela typically performs large physics simulations using a few thousands of cores, between 1k cores up to 16k on x86-based clusters. However, simulations with finer resolutions and with kinetic electrons increase those needs by a huge factor, providing a good example of applications requiring Exascale machines. To improve Gysela compute times, we take advantage of efficient SMT implementations available on recent INTEL architectures. We also analyze the cost of a transposition communication scheme that involves a large number of cores in our case. Adaptation of the code for balance load whenever using both SMT and good deployment strategy led to a significant reduction that can be up to 38% of the execution times.
Laure Erwin MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:00-16:30

MS Presentation

Decoupling and Coupling in iPIC3D, a Particle-in-Cell Code for Exascale, Erwin Laure (KTH, Sweden)

Co-Authors: Stefano Markidis (Royal Institute of Technology, Sweden); Erwin Laure (Royal Institute of Technology, Sweden); Yuxi Chen (University of Michigan, United States of America); Gabor Toth (University of Michigan, United States of America); Tamas Gombosi (University of Michigan, United States of America)

iPIC3D is a massively parallel three-dimensional implicit particle-in-cell code used for the study of the interactions between the solar wind and Earth's magnetosphere. Because the scalability potential and scientific impact demonstrated by iPIC3D, it has been selected in many European HPC projects to prepare for the future exascale machines. In this talk, we present new algorithmic changes to iPIC3D in preparation for the coming exascale era. In particular, we will present decoupled computation, communication and I/O operations in iPIC3D to address the challenges of irregular operations on large number of processes. Our evaluation results show that the performance benefits from this model increases as the scale of simulation increases. We also present a two-way coupled kinetic-fluid model with multiple implicit PIC domains (by the iPIC3D code) embedded in MHD (by the BATS-R-US code) under the Space Weather Modeling Framework (SWMF).
Lazzaro Alfio Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:10-11:30

Contributed Talk

Performance Improvement by Exploiting Sparsity for MPI Communication in Sparse Matrix-Matrix Multiplication, Alfio Lazzaro (ETH, Switzerland)

Co-Authors: Joost VandeVondele (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland)

DBCSR is the sparse matrix library at the heart of the CP2K linear scaling electronic structure theory algorithm. It is MPI and OpenMP parallel, and can exploit accelerators. The multiplication algorithm is based on Cannon's algorithm, whose scalability is limited by the MPI communication time. The implementation is based on MPI point-to-point communications. We present an improved implementation that takes in account the sparsity of the problem in order to reduce the communication. This implementation makes use of one-sided communications. Performance results for representative CP2K benchmarks will be also presented.
Poster

Poster

MAT-09 Sparse Matrix Multiplication Library for Linear Scaling DFT Calculations in Electronic Structure Codes, Alfio Lazzaro (ETH, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Andreas Glöss (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

The key operation for linear scaling DFT implemented in the CP2K quantum chemistry program is sparse matrix-matrix multiplication. For such a task, the sparse matrix library DBCSR (Distributed Block Compressed Sparse Row) has been developed. DBCSR takes full advantage of the block-structured sparse nature of the matrices for efficient computation and communication. It is MPI and OpenMP parallelized, and can exploit accelerators. We describe a strategy to improve DBCSR performance. DBCSR is available as a stand alone library at http://dbcsr.cp2k.org/ to be employed in electronic structure codes. To this end a streamlined API has been defined and a suite of tools has been developed to generate the full documentation of the library (API-DOC) by extracting the information provided directly in the source code. We give a flavour of the generated API-DOC by showing snapshots of selected HTML documentation pages and we sketch the design of such tools.
Poster

Poster

MAT-04 CP2K within the PASC Materials Network, Alfio Lazzaro (ETH, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Hans Pabst (Intel Semiconductor AG, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

One of the goals of the PASC project is to strengthen the networking in the Swiss material science community through active development of collaborative relationships among University researchers and CSCS staff. This includes assisting researchers in tuning, debugging, optimizing, and enhancing codes and applications for HPC resources, from mid-scale to national and international petascale facilities, with a view to the exascale transition. In addition, the application support specialists provide support for development projects on software porting techniques, parallelization and optimization strategies, deployment on diverse computational platforms, and data management. Here we present selected tools and software developed for CP2K [1]. Furthermore we show exemplary how a CP2K application can be tuned to optimally use all available HPC resources. With a view to the next-generation HPC hardware, we present first promising performance results for INTEL's Broadwell-EP and KNL platform. [1] The CP2K developers group, CP2K is freely available from: https://www.cp2k.org/, 2016
Lee Jinpil MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:00-14:15

MS Presentation

Omni Compiler and XcodeML: An Infrastructure for Source-to-Source Transformation, Jinpil Lee (AICS,A RAIKEN, Japan)

Co-Authors: Hitoshi Murai (AICS, RIKEN, Japan); Masahiro Nakao (AICS, RIKEN, Japan); Hidetoshi Iwashita (AICS, RIKEN, Japan); Jinpil Lee (AICS, RIKEN, Japan); Akihiro Tabuchi (University of Tsukuba, Japan)

We have been developing a compiler of PGAS programming language called XcalableMP for post-petascale computing. XcalableMP is a directive-based language extension of Fortran95 and C for scientific programming for high-performance distributed memory parallel systems. Omni Compiler is an infrastructure for source-to-source transformation to design source-to-source compilers such as Omni XcalableMP compiler. It includes C and Fortran95 front-ends which translate a source code to XML-based intermediate code called XcodeML, a Java-based code-transformation library on XcodeML, and the de-compilers which translate XcodeML intermediate code back to transformed source code. Currently, the Omni compiler also supports the code transformation for OpenMP and OpenACC. In this talk, we will present internals of Omni compiler by taking Omni XcalableMP compiler as a case study, and our future plan.
Leguizamon Sebastian Contributed Talk
Friday, June 10, 2016
Garden 1BC, 10:15-10:30

Contributed Talk

Thermomechanical Modeling of Impacting Particles on a Metallic Surface for the Erosion Prediction in Hydraulic Turbines, Sebastian Leguizamon (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL, Switzerland); Christian Vessaz (EPFL, Switzerland); François Avellan (EPFL, Switzerland)

Erosion damage in hydraulic turbines is a common problem caused by the high-velocity impact of small particles entrained in the fluid. Numerical simulations can be useful to investigate the effect of each governing parameter in this complex phenomenon. The Finite Volume Particle Method is used to simulate the three-dimensional impact of dozens of rigid spherical particles on a metallic surface. The very fine discretization and the overall number of time steps needed to achieve the steady state erosion rate render the problem very expensive, implying the need for high performance computing. In this talk, a comparison of constitutive models is presented, with the aim of assessing the complexity of the thermomechanical modelling required to accurately simulate the impact and subsequent erosion of metals. The importance of strain rate, triaxiality, friction model and thermal effects is discussed.
MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:45-15:00

MS Presentation

GPU-Accelerated Hydrodynamic Simulation of Hydraulic Turbines Using the Finite Volume Particle Method, Sebastian Leguizamon (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL / LMH, Switzerland); Christian Vessaz (EPFL / LMH, Switzerland); Sebastian Leguizamon (EPFL / LMH, Switzerland); François Avellan (EPFL / LMH, Switzerland)

Performance prediction based on numerical simulations can be very helpful in the design process of hydraulic turbines. The Finite Volume Particle Method (FVPM) is a consistent and conservative particle-based method which inherits interesting features of both Smoothed Particle Hydrodynamics and grid-based Finite Volume Method. This method is particularly well-suited for such simulations thanks to its versatility. SPHEROS is a parallel FVPM solver which has been developed at the EPFL - Laboratory for Hydraulic Machines for simulating Pelton turbines and silt erosion. In order to allow the simulation of industrial-size setups, a GPU version of SPHEROS (GPU-SPHEROS) is being developed in CUDA and features Thrust library to handle complicated structures such as octree. Besides, some highly-optimised kernels are also implemented for both compute-bound and memory-bound algorithms. Comparing the performance of different parts of GPU-SPHEROS and SPHEROS, we achieve a speed-up factor of at least eight.
Leutwyler David MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, David Leutwyler (ETH Zürich, Atmospheric and Climate Science, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.
Levitt Antoine Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:30-09:45

Contributed Talk

Parallel Eigensolvers for Plane-Wave Density Functional Theory, Antoine Levitt (Inria Paris, France)

Co-Authors:

Density functional theory (DFT) approximates the Schrödinger equation by modelling electronic correlation as a function of density. Its relatively modest O(N^3) scaling makes it the standard method in electronic structure computation for condensed phases containing up to thousands of atoms. Computationally, its bottleneck is the partial diagonalisation of a Hamiltonian operator, which is usually not formed explicitly. Using the example of the Abinit code, I will discuss the challenges involved in scaling plane-wave DFT computations to petascale supercomputers, and show how the implementation of a new method based on Chebyshev filtering results in good parallel behaviour up to tens of thousands of processors. I will also discuss some open problems in the numerical analysis of eigensolvers and extrapolation methods used to accelerate the convergence of fixed point iterations.
Li Dunzhu MS Presentation
Friday, June 10, 2016
Garden 1A, 09:00-09:30

MS Presentation

HPC Challenges Arising in Forward and Inverse Mantle Flow Simulation, Dunzhu Li (California Institute of Technology, United States of America)

Co-Authors: Johann Rudi (University of Texas at Austin, United States of America); Vishagan Ratnaswamy (California Institute of Technology, United States of America); Dunzhu Li (California Institute of Technology, United States of America); Tobin Isaac (University of Chicago, United States of America); Michael Gurnis (California Institute of Technology, United States of America); Omar Ghattas (University of Texas at Austin, United States of America)

We discuss scalable solvers for the forward and inverse simulation of mantle flow problems. Crucial solver components for the arising nonlinear Stokes problems are parallel multigrid methods for preconditioning the linearized Stokes system, and a Schur complement approximation that is able to cope with extreme viscosity variations. To achieve good parallel scalability, we use, among others, matrix-free operations and we redistribute coarse multigrid levels to a subsets of all available processors. We will discuss the inversion of global rheology parameters and distributed fields from surface data and the present-day temperature distribution in instantaneous and time-dependent problems.
Limongelli Vittorio Contributed Talk
Thursday, June 9, 2016
Garden 3A, 10:30-10:50

Contributed Talk

A Comprehensive Description of the Homo and Heterodimerization Mechanism of the Chemokine Receptors CCR5 and CXCR4, Vittorio Limongelli (Department of Informatics, Institute of Computational Science Università della S, Switzerland)

Co-Authors: Vittorio Limongelli (Università della Svizzera italiana, Switzerland)

Signal transduction across cellular membranes is controlled by G protein coupled receptors (GPCRs). It is widely accepted that members of the GPCR family self-assemble as dimers or higher-order structures being functional units in the plasma membrane. The chemokines receptors are GPCRs implicated in a wide range of physiological and non-physiological cell processes. These receptors represent prime targets for therapeutic intervention in a wide spectrum of inflammatory and autoimmune diseases, heart diseases, cancer and HIV. The CXCR4 and CCR5 receptors are two of the manly studied playing crucial roles in different pathologies. In this scenario the use of computational techniques able to describe complex biological processes such as protein dimerization acquires a great importance. Combining coarse-grained (CG) molecular dynamics and well-tempered metadynamics (MetaD) we are able to describe the mechanism of dimer formation, capturing multiple association and dissociation events allowing to compute a detailed free energy landscape of the process.
Poster

Poster

LS-01 A Comprehensive Description of the Homo and Heterodimerization Mechanism of the Chemokine Receptors CCR5 and CXCR4, Vittorio Limongelli (Department of Informatics, Institute of Computational Science Università della S, Switzerland)

Co-Authors: Vittorio Limongelli (Università della Svizzera italiana, Switzerland)

Signal transduction across cellular membranes is controlled by G protein coupled receptors (GPCRs). It is widely accepted that members of the GPCR family self-assemble as dimers or higher-order structures being functional units in the plasma membrane. The chemokines receptors are GPCRs implicated in a wide range of physiological and non-physiological cell processes. These receptors represent prime targets for therapeutic intervention in a wide spectrum of inflammatory and autoimmune diseases, heart diseases, cancer and HIV. The CXCR4 and CCR5 receptors are two of the manly studied playing crucial roles in different pathologies. In this scenario the use of computational techniques able to describe complex biological processes such as protein dimerization acquires a great importance. Combining coarse-grained (CG) molecular dynamics and well-tempered metadynamics (MetaD) we are able to describe the mechanism of dimer formation, capturing multiple association and dissociation events allowing to compute a detailed free energy landscape of the process.
Lisacek Frederique MS Presentation
Thursday, June 9, 2016
Garden 3A, 15:30-15:45

MS Presentation

Large-Scale Mass Spectrometry Data Analysis, Frederique Lisacek (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Frederique Lisacek (Swiss Institute of Bioinformatics, Switzerland); Markus Müller (Swiss Institute of Bioinformatics, Switzerland)

The purpose of this talk is to highlight the design, development and use of a java library supporting Hadoop MapReduce and Apache Spark cluster calculations for large-scale analysis of mass spectrometry data. While being noisy, redundant and ambiguous, the latter is generated by the many millions and contains key information for identifying active small and large molecules in complex biological samples. The library favours the fast and flexible implementation of customised analytical pipelines.
Lohse Detlef MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Detlef Lohse (PoF, University of Twente, Netherlands)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Lombardot Thierry Poster

Poster

LS-08 The UniProt SPARQL Endpoint: 21 Billion Triples in Production, Thierry Lombardot (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Sebastien Gehant (Swiss Institute of Bioinformatics, Switzerland); Thierry Lombardot (Swiss Institute of Bioinformatics, Switzerland); Lydie Bougueleret (Swiss Institute of Bioinformatics, Switzerland); Ioannis Xenarios (Swiss Institute of Bioinformatics, Switzerland); Nicole Redaschi (Swiss Institute of Bioinformatics, Switzerland)

The UniProt knowledgebase is a leading resource of protein sequences and functional information whose centerpiece is the expert-curated Swiss-Prot section. UniProt data is accessible at www.uniprot.org (via a user-friendly interface and a REST API) and at sparql.uniprot.org, a public SPARQL endpoint hosted and maintained by the Vital-IT and Swiss-Prot groups of SIB. With 21 billion RDF triples it is the largest free to use graph database in the sciences. SPARQL allows scientists to perform complex queries within UniProt and across datasets located on remote SPARQL endpoints. It provides a free data integration solution for users who cannot afford to create custom data warehouses, at a cost for the service providers. Here we discuss the challenges in maintaining the UniProt SPARQL endpoint, which is updated monthly in sync with the UniProt data releases.
Lomi Alessandro Poster

Poster

EMD-03 Parallel MCMC for Estimating Exponential Random Graph Models, Alessandro Lomi (Università della Svizzera italiana, Switzerland)

Co-Authors: Alex Stivala (University of Melbourne, Australia); Antonietta Mira (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Garry Robins (University of Melbourne, Australia); Alessandro Lomi (Università della Svizzera italiana, Switzerland)

As information and communication technologies continue to expand, the need arises to develop analytical strategies capable of accommodating new and larger sets of social network data. Considerable attention has recently been dedicated to the possibility of scaling exponential random graph models (ERGMs) - a well-established family of statistical models - for analyzing large social networks. Efficient computational methods would be highly desirable in order to extend the empirical scope of ERGM for the analysis of large social networks. We report preliminary results of a research project on the development of new sampling methods for ERGMs. We propose a new MCMC sampler and use it with Metropolis coupled Markov chain Monte Carlo, a typical scheme for MCMC parallelization. We show that, using this method, the CPU time for parameter estimation may be considerably reduced. *Generous support from the Swiss National Platform of Advanced Scientific Computing (PASC) is gratefully acknowledged.
Ltaief Hatem Paper
Thursday, June 9, 2016
Auditorium C, 10:30-11:00

Paper

Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs, Hatem Ltaief (KAUST, Saudi Arabia)

Co-Authors: Hatem Ltaief (KAUST, Saudi Arabia), Damien Gratadour (L'Observatoire de Paris, France); Eric Gendron (L'Observatoire de Paris, France)

We present a high performance comprehensive implementation of a multi-object adaptive optics (MOAO) simulation on multicore architectures with hardware accelerators in the context of computational astronomy. This implementation will be used as an operational testbed for simulating the design of new instruments for the European Extremely Large Telescope project (E-ELT), the world's biggest eye and one of Europe's highest priorities in ground-based astronomy. The simulation corresponds to a multi-step multi-stage procedure, which is fed, near real-time, by system and turbulence data coming from the telescope environment. Based on the PLASMA library powered by the OmpSs dynamic runtime system, our implementation relies on a task-based programming model to permit an asynchronous out-of-order execution. Using modern multicore architectures associated with the enormous computing power of GPUs, the resulting data-driven compute-intensive simulation of the entire MOAO application, composed of the tomographic reconstructor and the observing sequence, is capable of coping with the aforementioned real-time challenge and stands as a reference implementation for the computational astronomy community.
Luber Sandra Poster

Poster

MAT-10 What Influences the Water Oxidation Activity of a Bio-Inspired Molecular CoII4O4 Cubane?, Sandra Luber (University of Zurich, Switzerland)

Co-Authors: Sandra Luber (University of Zurich, Switzerland)

We investigated the reaction mechanism of the recently presented first Co(II)-based WOC, [CoII4(hmp)4(μ-OAc)2(μ2-OAc)2(H2O)2] (hmp=2-(hydroxymethyl)pyridine), which is one of the rare stable homogeneous cubane-type water oxidation catalysts (WOCs) and the design of which has been inspired by nature's oxygen evolving complex of photosystem II (PSII). Two possible different catalytic cycles have been envisioned: A single-site pathway involving only one cobalt center and a water attack on an oxo ligand or, alternatively, an oxo-oxo coupling pathway where two terminal oxo ligands of the cubane couple and are released as O2. Using density functional theory and an explicit first solvation shell, we compare relative free energies of all catalytic states and analyze their stability and reactivity. Furthermore, we compute barriers and reaction paths for the water attack and O2 release steps. With this knowledge at hand, we propose possibilities to tune catalytic activity paving the way to informed design of high-performance PSII mimics.
Luisier Mathieu Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:15-09:30

Contributed Talk

Ab-Initio Quantum Transport Simulation of Nano-Devices, Mathieu Luisier (ETH Zurich, Switzerland)

Co-Authors: Mauro Calderara (ETH Zurich, Switzerland); Mohammad Hossein Bani-Hashemian (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland)

To simulate advanced electronic devices such as nanoscale transistors or memory cells whose functionality may depend on the position of single atoms only, a quantum transport solver is needed, which should not only be capable of atomic scale resolution, but also to deal with systems consisting of thousands to a hundred thousands atoms. The device simulator OMEN and the electronic structure code CP2K have been united to perform ab initio quantum transport calculations on the level of density functional theory. To take full advantage of modern hybrid supercomputer architectures, new algorithms have been developed and implemented. They allow for the simultaneous computation of open boundary conditions in parallel on the available CPUs and the solution of the Schrödinger equation in a scalable way on the GPUs. The main concepts behind the algorithms will be presented and results for realistic nanostructures will be shown.
Poster

Poster

MAT-01 A Generalized Poisson Solver for First-Principles Device Simulations, Mathieu Luisier (ETH Zurich, Switzerland)

Co-Authors: Sascha Brück (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

We present a Poisson solver with main applications in ab-initio simulations of nanoelectronic devices. The solver employs a plane-wave (Fourier) based pseudospectral approach and is capable of solving the generalized Poisson equation with a position-dependent dielectric constant subject to periodic or homogeneous Neumann conditions on the boundaries of the simulation cell and Dirichlet type conditions imposed at arbitrary subdomains. Any sufficiently smooth function modelling the dielectric constant, including density dependent dielectric continuum models can be utilized. Furthermore, for all the boundary conditions, consistent derivatives are available allowing for energy conserving molecular dynamics simulations.
Lumsdaine Andrew Paper
Thursday, June 9, 2016
Auditorium C, 12:00-12:30

Paper

Context Matters: Distributed Graph Algorithms and Runtime Systems, Andrew Lumsdaine (Indiana University, United States of America)

Co-Authors: Jesun Sahariar Firoz (Indiana University, United States of America); Thejaka Amila Kanewala (Indiana University, United States of America); Marcin Zalewski (Indiana University, United States of America); Martina Barnas (Indiana University, United States of America)

The increasing complexity of the software/hardware stack of modern supercomputers makes understanding the performance of the modern massive-scale codes difficult. Distributed graph algorithms (DGAs) are at the forefront of that complexity, pushing the envelope with their massive irregularity and data dependency. We analyse the existing body of research on DGAs to assess how technical contributions are linked to experimental performance results in the field. We distinguish algorithm-level contributions related to graph problems from "runtime-level" concerns related to communication, scheduling, and other low-level features necessary to make distributed algorithms work. We show that the runtime is an integral part of DGAs' experimental results, but it is often ignored by the authors in favor of algorithm-level contributions. We argue that a DGA can only be fully understood as a combination of these two aspects and that detailed reporting of runtime details must become an integral part of scientific standard in the field if results are to be truly understandable and interpretable. Based on our analysis of the field, we provide a template for reporting the runtime details of DGA results, and we further motivate the importance of these details by discussing in detail how seemingly minor runtime changes can make or break a DGA.
Luporini Fabio MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, Fabio Luporini (Imperial College London, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Lücker Adrien MS Presentation
Wednesday, June 8, 2016
Garden 3A, 17:00-17:15

MS Presentation

An Overset Grid Method for Oxygen Transport from Red Blood Cells in Capillary Networks, Adrien Lücker (ETH Zürich, Institut of Fluid Dynamics, Switzerland)

Co-Authors: Bruno Weber (University of Zurich, Switzerland); Patrick Jenny (ETH Zurich, Switzerland)

Most oxygen in the blood circulation is carried bound to hemoglobin in red blood cells (RBCs). In capillaries, the oxygen partial pressure (PO2) is affected by the individual RBCs that flow in a single file. We have developed a novel overset grid method for oxygen transport from capillaries to tissue. This approach uses moving grids for RBCs and a fixed one for the blood vessels and the tissue. This combination enables accurate modelling of the intravascular PO2 field and the unloading of oxygen from RBCs. Additionally, our model can account for fluctuations in hematocrit and hemoglobin saturation. Its parallel implementation in OpenFOAM supports three-dimensional tortuous capillary networks. Simulations of oxygen transport in the rodent cerebral cortex have been performed and are used to study the cerebral energy metabolism. Other applications include the investigation of hemoglobin saturation heterogeneity in capillary networks.
Lüthi Daniel MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, Daniel Lüthi (Atmospheric and Climate Science, ETH Zurich, Switzerland, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.

M

Maertens Audrey MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:45-15:00

MS Presentation

GPU-Accelerated Hydrodynamic Simulation of Hydraulic Turbines Using the Finite Volume Particle Method, Audrey Maertens (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL / LMH, Switzerland); Christian Vessaz (EPFL / LMH, Switzerland); Sebastian Leguizamon (EPFL / LMH, Switzerland); François Avellan (EPFL / LMH, Switzerland)

Performance prediction based on numerical simulations can be very helpful in the design process of hydraulic turbines. The Finite Volume Particle Method (FVPM) is a consistent and conservative particle-based method which inherits interesting features of both Smoothed Particle Hydrodynamics and grid-based Finite Volume Method. This method is particularly well-suited for such simulations thanks to its versatility. SPHEROS is a parallel FVPM solver which has been developed at the EPFL - Laboratory for Hydraulic Machines for simulating Pelton turbines and silt erosion. In order to allow the simulation of industrial-size setups, a GPU version of SPHEROS (GPU-SPHEROS) is being developed in CUDA and features Thrust library to handle complicated structures such as octree. Besides, some highly-optimised kernels are also implemented for both compute-bound and memory-bound algorithms. Comparing the performance of different parts of GPU-SPHEROS and SPHEROS, we achieve a speed-up factor of at least eight.
Contributed Talk
Friday, June 10, 2016
Garden 1BC, 10:15-10:30

Contributed Talk

Thermomechanical Modeling of Impacting Particles on a Metallic Surface for the Erosion Prediction in Hydraulic Turbines, Audrey Maertens (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL, Switzerland); Christian Vessaz (EPFL, Switzerland); François Avellan (EPFL, Switzerland)

Erosion damage in hydraulic turbines is a common problem caused by the high-velocity impact of small particles entrained in the fluid. Numerical simulations can be useful to investigate the effect of each governing parameter in this complex phenomenon. The Finite Volume Particle Method is used to simulate the three-dimensional impact of dozens of rigid spherical particles on a metallic surface. The very fine discretization and the overall number of time steps needed to achieve the steady state erosion rate render the problem very expensive, implying the need for high performance computing. In this talk, a comparison of constitutive models is presented, with the aim of assessing the complexity of the thermomechanical modelling required to accurately simulate the impact and subsequent erosion of metals. The importance of strain rate, triaxiality, friction model and thermal effects is discussed.
Mani Ali MS Presentation
Thursday, June 9, 2016
Garden 2A, 11:00-11:30

MS Presentation

Fluid Mechanics of Electrochemical Interfaces: Instability and Chaos Near Ion-Selective Surfaces, Ali Mani (Stanford University, United States of America)

Co-Authors:

Electrochemical interfaces are host to a range of physical phenomena involving ion-transport, electrostatics and fluid flow. In this presentation, we consider voltage-driven ion transport from an aqueous electrolyte to an ion-selective membrane as a canonical setting with broad applications in electrochemistry. We will present results from our numerical simulations demonstrating that, beyond a threshold voltage, such interfaces trigger hydrodynamic chaos with multi-scale vortices despite their low Reynolds number. Namely, structures with scales from sub-millimeter down to tens of nanometers can be formed as a natural result of these hydrodynamic instabilities. These flow structures are shown to impact mixing and enhance system-level transport well beyond nominal diffusion-controlled processes. We will demonstrate the need for the development of specialized algorithms for computation of these systems, similar to the tools that have been traditionally used for the simulations of turbulent flows. Such calculations require massively parallel computational resources beyond what is available today.
Mardal Kent-Andre MS Presentation
Thursday, June 9, 2016
Garden 3A, 11:30-11:50

MS Presentation

Assessment of Transitional Hemodynamics in Intracranial Aneurysms at Extreme Scale, Kent-Andre Mardal (University of Oslo, Norway)

Co-Authors: Sabine Roller (University of Siegen, Germany); Kent-Andre Mardal (University of Oslo, Norway)

Computational fluid dynamics (CFD) is extensively used for modelling of blood flow in intracranial aneurysms as it can help clinicians in decision for intervention, and may potentially provide information on the pathogenesis of the condition. The flow regime in aneurysms, due to low Reynolds number is mostly presumed laminar - an assumption that was challenged in recent publications that showed high frequency fluctuations in aneurysms resembling transitional flow. The present work aspires to scrutinize the issue of transition in aneurysmal hemodynamics by performing first true direct numerical simulations on aneurysms of various morphologies, with resolutions of the order of Kolmogorov scales, resulting in 1 billion cells. The results show the onset of fluctuations in flow inside aneurysm during deceleration phase of the cardiac cycle before a re-laminarization during acceleration. The fluctuations confine in the aneurysm dome suggesting the manifestation of aneurysm as an initiator of transition to turbulence.
Contributed Talk
Wednesday, June 8, 2016
Garden 3A, 14:30-14:45

Contributed Talk

Direct Numerical Simulation of Transitional Hydrodynamics of the Cerebrospinal Fluid in Chiari I Malformation, Kent-Andre Mardal (University of Oslo, Norway)

Co-Authors: Kent-Andre Mardal (University of Oslo, Norway)

Chiari malformation type I is a disorder characterized by the herniation of cerebellar tonsils into the spinal canal resulting in obstruction to cerebrospinal fluid (CSF) outflow. The flow of oscillating CSF is acutely complex due to the anatomy of the subarachnoid space. We report first ever direct numerical simulations on patient specific cases with resolutions that border Kolmogorov scales, amounting to meshes with 2 billion cells and conducted on 50,000 cores of the Hazelhen supercomputer in Stuttgart. Results depict velocity fluctuations of 10kHz, turbulent kinetic energy 2x of mean flow energy in Chiari patients while the flow remains laminar in control subject. The fluctuations confine near craniovertebral junction, and are commensurate with the extremeness of pathology and the extent of herniation. The results advocate that the manifestation of pathological conditions like Chiari malformation may lead to transitional CSF, and a prudent calibration of numerics is necessary to capture such phenomena.
Marelli Stefano MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:30-15:45

MS Presentation

Uncertainty Quantification and Global Sensitivity Analysis for Economic Models, Stefano Marelli (ETH Zurich, Switzerland)

Co-Authors: Viktor Winschel (ETH Zurich, Switzerland); Stefano Marelli (ETH Zurich, Switzerland); Bruno Sudret (ETH Zurich, Switzerland)

We present a method for global sensitivity analysis of the outcomes of an economic model with respect to their parameters. Traditional sensitivity analyses, like comparative statics, scenario and robustness analysis are local and depend on the chosen combination of parameter values. Our global approach specifies a distribution for each parameter and approximates the outcomes as a polynomial of parameters. In contrast to local analyses, the global sensitivity analysis takes into account non-linearities and interactions. Using the polynomial, we compute the distribution of outcomes and a variance decomposition called Sobol' indices. We obtain an importance ranking of the parameters and their interactions, which can guide calibration exercises and model development. We compare the local to the global approach for the mean and variance of production in a canonical real business cycle model. We find an interesting separation result: for mean production, only capital share, leisure substitution rate, and depreciation rate matter.
Margara Alessandro MS Summary

MS Summary

MS06 Software Engineering Meets Scientific Computing: Generality, Reusability and Performance for Scientific Software Platforms I: Engineering Methodologies and Development Processes, Alessandro Margara (Università della Svizzera italiana, Switzerland)

Co-Authors: Alessandro Margara (Università della Svizzera Italiana, Switzerland)

Software platforms for modelling and simulation of scientific problems are becoming increasingly important in many fields and often drive the scientific discovery process. These platforms present unique requirements in terms of functionalities, performance and scalability, which limit the applicability of consolidated software engineering practices for their design, implementation and validation. For instance, since the effectiveness of a software platform for scientific simulation strictly depends on the level of performance and scalability it can achieve, the design, development and optimization of the platform are usually tailored to the specific hardware architecture the platform is expected to run on. Similarly, when a scientific simulation requires the integration of multiple software platforms, such integration is typically customized for the specific simulation problem at hand. Because of this, developing and integrating scientific computing platforms demands for a significant amount of relevant knowledge about the modeled domain and the software and hardware infrastructures used for simulation. This information typically remains hidden in the implementation details of a specific solution and cannot be easily reused to port the simulation to different hardware infrastructures or to implement or integrate different simulation platforms on the same hardware infrastructure. The Software Engineering for Scientific Computing (SESC) minisymposium is concerned with identifying suitable engineering processes to design, develop, integrate and validate software platforms for scientific modelling and simulations. This introduces challenges that require the expertise of researchers working in different areas, including computational scientists to model scientific problems, software engineers to propose engineering methodology and HPC experts to analyze platform dependent performance requirements that characterize simulations. The goal of the SESC minisymposium is to bring together software engineers, computational scientists and HPC experts to discuss and advance the engineering practices to implement platforms for scientific computing, aiming to reduce the development time, increase the reusability, the maintainability and the testability of the platforms, while offering the level of performance and scalability that is required by the simulation scenarios at hand. Specifically, the Software Engineering for Scientific Computing (SESC) minisymposium aims to address two conflicting requirements in the definition of an effective software development process: 1) promoting generality and reusability of software components, to simplify maintenance, evolution, adaptation and porting of software platforms 2) defining solution that guarantee an adequate level of performance and scalability, which is of paramount importance in scientific simulations. The SESC minisymposium is organized around two sessions: this first session focuses more specifically on design methodologies and development processes for general and reusable code; the second session (Part 2) targets the requirements of performance and scalability in scientific software platforms.
MS Summary

MS Summary

MS12 Software Engineering Meets Scientific Computing: Generality, Reusability and Performance for Scientific Software Platforms II: Performance and Scalability Requirements, Alessandro Margara (Università della Svizzera italiana, Switzerland)

Co-Authors: Alessandro Margara (Università della Svizzera Italiana, Switzerland)

Software platforms for modelling and simulation of scientific problems are becoming increasingly important in many fields and often drive the scientific discovery process. These platforms present unique requirements in terms of functionalities, performance and scalability, which limit the applicability of consolidated software engineering practices for their design, implementation and validation. For instance, since the effectiveness of a software platform for scientific simulation strictly depends on the level of performance and scalability it can achieve, the design, development and optimization of the platform are usually tailored to the specific hardware architecture the platform is expected to run on. Similarly, when a scientific simulation requires the integration of multiple software platforms, such integration is typically customized for the specific simulation problem at hand. Because of this, developing and integrating scientific computing platforms demands for a significant amount of relevant knowledge about the modeled domain and the software and hardware infrastructures used for simulation. This information typically remains hidden in the implementation details of a specific solution and cannot be easily reused to port the simulation to different hardware infrastructures or to implement or integrate different simulation platforms on the same hardware infrastructure. The Software Engineering for Scientific Computing (SESC) symposium is concerned with identifying suitable engineering processes to design, develop, integrate and validate software platforms for scientific modelling and simulations. This introduces challenges that require the expertise of researchers working in different areas, including computational scientists to model scientific problems, software engineers to propose engineering methodology and HPC experts to analyze platform dependent performance requirements that characterize simulations. The goal of the SESC minisymposium is to bring together software engineers, computational scientists and HPC experts to discuss and advance the engineering practices to implement platforms for scientific computing, aiming to reduce the development time, increase the reusability, the maintainability and the testability of the platforms, while offering the level of performance and scalability that is required by the simulation scenarios at hand. Specifically, the Software Engineering for Scientific Computing (SESC) minisymposium aims to address two conflicting requirements in the definition of an effective software development process: 1) promoting generality and reusability of software components, to simplify maintenance, evolution, adaptation and porting of software platforms 2) defining solution that guarantee an adequate level of performance and scalability, which is of paramount importance in scientific simulations. The SESC minisymposium is organized around two sessions: this session targets the requirements of performance and scalability in scientific software platforms; the other session (Part 1) focuses more specifically on design methodologies and development processes for general and reusable code.
Markidis Stefano MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:00-16:30

MS Presentation

Decoupling and Coupling in iPIC3D, a Particle-in-Cell Code for Exascale, Stefano Markidis (KTH, Sweden)

Co-Authors: Stefano Markidis (Royal Institute of Technology, Sweden); Erwin Laure (Royal Institute of Technology, Sweden); Yuxi Chen (University of Michigan, United States of America); Gabor Toth (University of Michigan, United States of America); Tamas Gombosi (University of Michigan, United States of America)

iPIC3D is a massively parallel three-dimensional implicit particle-in-cell code used for the study of the interactions between the solar wind and Earth's magnetosphere. Because the scalability potential and scientific impact demonstrated by iPIC3D, it has been selected in many European HPC projects to prepare for the future exascale machines. In this talk, we present new algorithmic changes to iPIC3D in preparation for the coming exascale era. In particular, we will present decoupled computation, communication and I/O operations in iPIC3D to address the challenges of irregular operations on large number of processes. Our evaluation results show that the performance benefits from this model increases as the scale of simulation increases. We also present a two-way coupled kinetic-fluid model with multiple implicit PIC domains (by the iPIC3D code) embedded in MHD (by the BATS-R-US code) under the Space Weather Modeling Framework (SWMF).
Markopoulos Alexandros Paper
Wednesday, June 8, 2016
Auditorium C, 16:30-17:00

Paper

Massively Parallel Hybrid Total FETI (HTFETI) Solver, Alexandros Markopoulos (IT4Innovations National Supercomputing Center, Ostrava, Czech Republic)

Co-Authors: Tomáš Brzobohatý (IT4Innovations National Supercomputing Center, Czech Republic); Alexandros Markopoulos (IT4Innovations National Supercomputing Center, Czech Republic); Ondřej Meca (IT4Innovations National Supercomputing Center, Czech Republic); Tomáš Kozubek (IT4Innovations National Supercomputing Center, Czech Republic)

This paper describes the Hybrid Total FETI (HTFETI) method and its parallel implementation in the ESPRESO library. HTFETI is a variant of the FETI type domain decomposition method in which a small number of neighboring subdomains is aggregated into clusters. This can be also viewed as a multilevel decomposition approach which results into a smaller coarse problem - the main scalability bottleneck of the FETI and FETI-DP methods.

The efficiency of our implementation which employs hybrid parallelization in the form of MPI and Cilk++ is evaluated using both weak and strong scalability tests. The weak scalability of the solver is shown on the 3 dimensional linear elasticity problem of a size up to 30 billion of Degrees Of Freedom (DOF) executed on 4096 compute nodes. The strong scalability is evaluated on the problem of size 2.6 billion DOF scaled from 1000 to 4913 compute nodes. The results show the super-linear scaling of the single iteration time and linear scalability of the solver runtime. The latter combines both numerical and parallel scalability and shows overall HTFETI solver performance. The large scale tests use our own parallel synthetics benchmark generator that is also described in the paper.

The last set of results shows that HTFETI is very efficient for problems of size up 1.7 billion DOF and provide better time to solution when compared to TFETI method.
Marsman Martijn MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 14:00-14:20

MS Presentation

VASP on Future Hardware: MIC and GPU Acceleration, Martijn Marsman (University Vienna, Austria)

Co-Authors:

The Vienna Ab initio Simulation Package (VASP) is a widely used electronic structure code. The release version is parallelized using MPI and runs efficiently on current HPC hardware all over the world. The next generation of HPC hardware, however, will be quite different from the current (Xeon-like) hardware: in all probability it will consist either of GPU accelerated nodes or MIC nodes (e.g. Intel's upcoming Knights Landing processors). In my talk I'll describe the work we have been doing to get VASP ready to run efficiently on these new HPC architectures.
Marti Philippe Poster

Poster

EAR-04 Implicit Treatment of Inertial Waves in Dynamo Simulations, Philippe Marti (University of Colorado at Boulder, United States of America)

Co-Authors: Michael A. Calkins (University of Colorado at Boulder, United States of America); Keith Julien (University of Colorado at Boulder, United States of America)

The explicit treatment of inertial waves imposes a very small timestep in dynamo simulations at low Ekman number. We present a fully spectral Chebyshev tau method that allows us to treat the inertial waves implicitly. The large linear systems that need to be solved at each timestep remain affordable thanks to the sparsity of the formulation. The simulations are parallelised using a 2D data decomposition for the nonlinear calculations combined with a parallel linear solver for the timestepping. Despite the increased complexity, significant gains in wall-clock time are achieved thanks to larger timesteps.
MS Summary

MS Summary

MS16 Understanding the Dynamics of Planetary Dynamos, Philippe Marti (University of Colorado at Boulder, United States of America)

Co-Authors:

Realistic numerical simulations of planetary dynamos are impossible to carry out with current computational resources due to the extreme conditions under which they operate. As a result, simulations are performed with parameters values that are quite distant from those that are typical of planets. How close are the current state-of-the art dynamo simulations for a realistic planetary system? Are direct numerical simulations the right approach for investigating the extreme parameter regime that characterizes planetary cores, or is there an alternative? This minisymposium aims to present the latest advances in understanding the dynamics of planetary dynamos through well-established direct numerical simulations, asymptotic modelling as well as novel numerical algorithms.
Marzari Nicola MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:00-15:20

MS Presentation

High-Throughput Prediction of Novel Two-Dimensional Materials, Nicola Marzari (EPFL, Switzerland)

Co-Authors: Philippe Schwaller (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Ivano E. Castelli (EPFL, Switzerland); Marco Gibertini (EPFL, Switzerland); Giovanni Pizzi (EPFL, Switzerland); Nicola Marzari (EPFL, Switzerland)

As a crucial step towards the identification of novel and promising 2D materials, we provide here a large scale first-principles exploration and characterization of such compounds. From a combination of 480,000 structures harvested from the ICSD and COD databases, three-dimensional crystals are screened systematically by checking the absence of chemical bonds between adjacent layers, identifying more than 6,000 layered systems. Then DFT calculations of the van der Waals interlayer bonding are performed with automatic workflows, while systematically assessing the metallic, insulating or magnetic character of the materials obtained. Following full atomic and cell relaxations, phonon dispersions are computed as a first step towards the assessment of thermodynamic properties. Thanks to the AiiDA materials' informatics platform [1], and in particular its automatic workflow engine, database structure, sharing capabilities, and pipelines to/from crystallographic repositories, the systematic and reproducible calculation of these properties becomes straightforward, together with seamless accessibility and sharing. [1] http://aiida.net
Poster

Poster

MAT-03 Complex Wet-Environments in Electronic-Structure Calculations, Nicola Marzari (EPFL, Switzerland)

Co-Authors: Luigi Genovese (CEA/INAC, France); Oliviero Andreussi (Università della Svizzera italiana, Switzerland); Nicola Marzari (EPFL, Switzerland); Stefan Goedecker (University of Basel, Switzerland)

The computational study of chemical reactions in complex, wet environments is critical for applications in many fields. It is often essential to study chemical reactions at the ab-initio level in the presence of the proper electrochemical environment. In this work we present a continuum solvation library able to handle both neutral and ionic solutions, solving the Generalized Poisson and the Poisson-Boltzmann problem. Two different recipes have been implemented to build up the continuum dielectric cavity (one using atomic coordinates, the other mapping the solute electronic density). A preconditioned conjugate gradient method has been implemented for the Generalized Poisson equation, whilst a self-consistent procedure enables us to solve the Poisson-Boltzmann problem. Both solvers and continuum dielectric cavities have been integrated into the BigDFT electronic-structure package. We benchmarked the whole library on several atomistic systems including small neutral molecules, large proteins, solvated surfaces and reactions in solution to demonstrate efficiency and performances.
MS Summary

MS Summary

MS21 Materials Design by High-Throughput Ab Initio Computing, Nicola Marzari (EPFL, Switzerland)

Co-Authors: Gian-Marco Rignanese (Université catholique de Louvain, Belgium)

Materials advances often drive technological innovation (faster computers, more efficient solar cells, more compact energy storage). Experimental discovery of new materials suitable for specific applications is, however, a complex task, relying on high costs and time-consuming procedures of synthesis. Computational materials science is now powerful enough for predicting many materials properties even before synthesizing those materials in the lab and appears as a cheap basis to orient experimental searches efficiently. Recent advances in computer speed and first-principles algorithms have led to the development of fast and robust codes, making it possible to do large numbers of calculations automatically. This is the burgeoning area of high-throughput first-principles computation. The concept though simple is very powerful. High-throughput calculations are used to create large databases containing the calculated properties of existing and hypothetical materials. These databases can then be intelligently interrogated, searching for materials with desired properties and so removing the guesswork from materials design. Various open-domain on-line repositories have appeared to make these databases available to everyone. Areas of applications include solar materials, topological insulators, thermoelectrics, piezoelectrics, materials for catalysis, battery materials, etc. While it has reached a good level of maturity, the high-throughput first-principles approach still requires many improvements. Several important properties and classes of materials have not been dealt with yet, and further algorithm implementations, repositories and data-mining interfaces are necessary. This minisymposium will be devoted to the presentation of the most recent developments in this field. It will also provide an agora for some of the leading researchers to put forward the most recent achievements, to address the current challenges, and to discuss the most promising directions. The speakers will be selected to illustrate the many disciplines that are contributing to this effort, covering practical applications (e.g., magnetic materials, thermoelectrics, transparent conductors, 2D materials), theoretical developments (e.g., novel functionals within Density Functional Theory, local basis representations for effective ab-initio tight-binding schemes), and technical aspects (e.g., high-throughput frameworks).
May Dave MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:45-15:00

MS Presentation

Salvus: A Flexible Open-Source Package for Full-Waveform Modelling and Inversion, Dave May (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Martin van Driel (ETH Zurich, Switzerland); Lion Krischer (Ludwig Maximilian University of Munich, Germany); Dave A. May (ETH Zurich, Switzerland); Max Rietmann (ETH Zurich, Switzerland); Korbinian Sager (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Within all domain-specific software projects, finding the correct balance between flexibility and performance is often difficult. In the seismic imaging community, the trend has been to move towards codes which are heavily optimised, but which often sacrifice usability and flexibility. Here we introduce Salvus: an open-source HPC high-order finite element (FE) package focused on full-waveform modelling and inversion, which is designed to be both flexible and performant. Salvus was constructed by following modern software design practices, testing protocols, and by establishing its foundations upon existing open-source high-level scientific libraries. The FE framework is generalized over spatial dimensions, time-integrators, polynomial order and wave-propagation physics, and provides support for both hexahedral and tetrahedral meshes. Additionally, support is included for various numerical optimisation methods. We discuss our usage of existing open-source scientific libraries, our choice level of abstraction, and quantitatively investigate the performance penalties associated with these abstractions.
MS Presentation
Friday, June 10, 2016
Garden 1A, 10:00-10:15

MS Presentation

From Tectonic to Seismic Timescales in 3D Continuum Models, Dave May (ETH Zurich, Switzerland)

Co-Authors: Ylona van Dinther (ETH Zurich, Switzerland); Laetitia Le Pourhiet (Pierre and Marie Curie University, France); Dave A. May (ETH Zurich, Switzerland); Taras Gerya (ETH Zurich, Switzerland)

Lateral rupture limits substantially regulate the magnitude of great subduction megathrust earthquakes, but in turn, factors controlling it remain largely unknown due to the limited spatio-temporal range of observations. It however involves the long-term, regional tectonic history, including structural, stress and strength heterogeneities. This problem requires a powerful 3D-continuum numerical modelling approach that bridges tectonic and seismic timescales, but a suitable code is lacking. We demonstrate the development of a scalable PETSc-based staggered-grid finite difference code, in which self-consistent long-term deformation and spontaneous rupture are ensured through a solid-mechanics based visco-elasto-plastic rheology with a slip rate-dependent friction formulation, an energy-conservative inertial implementation, artificial damping of seismic waves at the domain boundaries, and an adaptive, implicit-explicit time-stepping scheme. Automated discretization and manufactured solution benchmarks ensure stability, flexibility and accuracy of the code at every stage of development.
Paper
Wednesday, June 8, 2016
Auditorium C, 15:30-16:00

Paper

Extreme-Scale Multigrid Components within PETSc, Dave May (ETH Zurich, Switzerland)

Co-Authors: Dave A. May (ETH Zurich, Switzerland); Karl Rupp (Austria); Matthew G. Knepley (Rice University, United States of America); Barry F. Smith (Argonne National Laboratory, United States of America)

Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for highresolution simulations. The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely affected by using a coarse-level solver with sub-optimal algorithmic complexity. To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary.

In this work, we present a new software component introduced within the Portable Extensible Toolkit for Scientific computation (PETSc) which permits agglomeration. We provide an overview of the design and implementation of this functionality, together with several use cases highlighting the benefits of agglomeration. Lastly, we demonstrate via numerical experiments employing geometric multigrid with structured meshes, the flexibility and performance gains possible using our MPI-rank agglomeration implementation.
MS Summary

MS Summary

MS26 Bridging Scales in Geosciences, Dave May (ETH Zurich, Switzerland)

Co-Authors: Dave A. May (ETH Zurich, Switzerland), Michael Bader (Leibniz Supercomputing Centre, Technische Universitaet Muenchen, Germany)

Complex, but relevant processes within the Solid Earth domain cover a wide range of space and time scales, up to 17 and 26 orders of magnitude, respectively. Earthquake propagation, for instance, depends on dynamic processes at the rupture tip over 10^-9 seconds, while the plate tectonic faults on which they occur evolve over time scales up to 100's of millions of years. While problems in imaging and modelling of mantle processes on the Earth's tens of thousands of kilometers scale can be affected by physio-chemical compositions varying on a meter scale and being determined on a molecular level. Besides these examples ample of other physical processes in geophysics cross the largest imaginable scales. At each of the characteristic scales different physical processes are relevant, which thus requires us to couple the relevant physics at the different scales. Simulating the physics at each of these scales is a tremendous task, which hence often requires High Performance Computing. Computational challenges include, but are not limited to, a large number of degrees of freedom and crossing the two-scale problem on which most computational tools are founded. To discuss and start to tackle these challenges we aim to bring together computer and geoscientists to discuss them from different perspectives. Applications within geosciences, include, but or not limited to, geodynamics, seismology, fluid dynamics, tectonics, geomagnetism, and exploration geophysics.
McRae Andrew T. T. MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, Andrew T. T. McRae (University of Bath, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Meca Ondrej Paper
Wednesday, June 8, 2016
Auditorium C, 16:30-17:00

Paper

Massively Parallel Hybrid Total FETI (HTFETI) Solver, Ondrej Meca (IT4Innovations National Supercomputing Center, Ostrava, Czech Republic)

Co-Authors: Tomáš Brzobohatý (IT4Innovations National Supercomputing Center, Czech Republic); Alexandros Markopoulos (IT4Innovations National Supercomputing Center, Czech Republic); Ondřej Meca (IT4Innovations National Supercomputing Center, Czech Republic); Tomáš Kozubek (IT4Innovations National Supercomputing Center, Czech Republic)

This paper describes the Hybrid Total FETI (HTFETI) method and its parallel implementation in the ESPRESO library. HTFETI is a variant of the FETI type domain decomposition method in which a small number of neighboring subdomains is aggregated into clusters. This can be also viewed as a multilevel decomposition approach which results into a smaller coarse problem - the main scalability bottleneck of the FETI and FETI-DP methods.

The efficiency of our implementation which employs hybrid parallelization in the form of MPI and Cilk++ is evaluated using both weak and strong scalability tests. The weak scalability of the solver is shown on the 3 dimensional linear elasticity problem of a size up to 30 billion of Degrees Of Freedom (DOF) executed on 4096 compute nodes. The strong scalability is evaluated on the problem of size 2.6 billion DOF scaled from 1000 to 4913 compute nodes. The results show the super-linear scaling of the single iteration time and linear scalability of the solver runtime. The latter combines both numerical and parallel scalability and shows overall HTFETI solver performance. The large scale tests use our own parallel synthetics benchmark generator that is also described in the paper.

The last set of results shows that HTFETI is very efficient for problems of size up 1.7 billion DOF and provide better time to solution when compared to TFETI method.
Mengaldo Gianmarco MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:00-15:15

MS Presentation

Exploring Novel Numerical Methods and Algorithms on Emerging Hardware, Gianmarco Mengaldo (ECMWF, United Kingdom)

Co-Authors:

The importance of adapting numerical methods and the underlying algorithms to the hardware where these algorithms will be used is becoming crucial in many areas, including weather and climate modelling, engineering, finance, life sciences, etc. This aspect is even more important when targeting large HPC systems and will become a key for successfully produce large-scale simulations on the next generation HPC platforms. The current work explores a strategy being developed as part of the Horizon2020 ESCAPE project that tries to address this issue within the context of Weather & Climate modelling. In particular, we identified some key building blocks (also referred to as dwarfs) of Weather & Climate models, we isolated them and we created related self-contained mini-applications. These mini-applications can be tested on different hardware and re-adapted (eventually changing the underlying algorithms or changing the overall strategy) to achieve the best performance on different platforms.
Merkys Andrius MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:00-15:20

MS Presentation

High-Throughput Prediction of Novel Two-Dimensional Materials, Andrius Merkys (EPFL, Switzerland)

Co-Authors: Philippe Schwaller (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Ivano E. Castelli (EPFL, Switzerland); Marco Gibertini (EPFL, Switzerland); Giovanni Pizzi (EPFL, Switzerland); Nicola Marzari (EPFL, Switzerland)

As a crucial step towards the identification of novel and promising 2D materials, we provide here a large scale first-principles exploration and characterization of such compounds. From a combination of 480,000 structures harvested from the ICSD and COD databases, three-dimensional crystals are screened systematically by checking the absence of chemical bonds between adjacent layers, identifying more than 6,000 layered systems. Then DFT calculations of the van der Waals interlayer bonding are performed with automatic workflows, while systematically assessing the metallic, insulating or magnetic character of the materials obtained. Following full atomic and cell relaxations, phonon dispersions are computed as a first step towards the assessment of thermodynamic properties. Thanks to the AiiDA materials' informatics platform [1], and in particular its automatic workflow engine, database structure, sharing capabilities, and pipelines to/from crystallographic repositories, the systematic and reproducible calculation of these properties becomes straightforward, together with seamless accessibility and sharing. [1] http://aiida.net
Messmer Peter MS Presentation
Thursday, June 9, 2016
Garden 3B, 11:45-12:00

MS Presentation

Compute, Analyze and Visualize: Novel Workflows on GPU Accelerated Supercomputers, Peter Messmer (NVIDIA, United States of America)

Co-Authors:

Supercomputers are typically used in a batch oriented fashion: simulations are launched via queuing system, intermediate results stored on a global file system and the final results are obtained via post-processing on a separate visualization system. The amount of data generated by large-scale simulations and the increasing complexity of analysis and visualization algorithms turn this last step into a HPC problem by itself and render the traditional workflow inadequate. The tremendous compute power and their advanced graphics capabilities make GPUs the ideal accelerator for modern HPC systems, enabling not only floating point intensive workloads, but also big data analysis and visualization tasks with the same massively parallel resource, enabling novel workflows like exploratory analysis or application monitoring and steering. In this talk, I will introduce some hardware and software features on modern GPUs supporting such workflows and present some use-case examples.
Michel Yann MS Presentation
Wednesday, June 8, 2016
Garden 3B, 16:00-16:30

MS Presentation

Ensemble Data Assimilation at Météo-France, Yann Michel (Meteo-France, France)

Co-Authors:

Météo-France has been running an operational NWP system at kilometric scale over France for the last 7 years. The aim is in particular to forecast highly precipitating events over the Mediterranean. This is the "AROME" NWP system, which is based on a 3DVar assimilating a comprehensive set of observations, including Doppler radar data and reflectivity. Yet, progress can be expected from increased use of ensembles in data assimilation, to better describe the background error statistics. A new data assimilation scheme is being developed for AROME, named EnVar, in which background error covariances are directly estimated over an ensemble and localized. The variational framework is kept in order to allow to assimilate efficiently a wide range of observations. We will show preliminary results and discuss current achievements in the numerical efficiency of the scheme with particular attention to the localization.
Michele Ceriotti Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:00-09:15

Contributed Talk

A Distance to Map and Understand Materials and Molecules, and to Predict their Properties, Ceriotti Michele (EPFL, Switzerland)

Co-Authors:

Atomistic computer simulations give access to increasingly accurate and predictive modelling of materials, chemical and biochemical compounds. As more complex systems become amenable to computation, the sheer amount of data produced by a simulation, as well as its intrinsic structural complexity, make it harder to extract physical insight from modelling. I will discuss how to build a robust, effective metric to compare molecules and condensed-phase structures, and how this can be used to represent large databases of compounds, building intuitive maps of chemical landscapes and singling out outliers and anomalous conformations. Furthermore, I will demonstrate the potential of this approach to machine-learn the physical-chemical properties of such compounds, that promises to circumvent the need for expensive quantum chemical calculations.
MS Summary

MS Summary

MS10 From Materials' Data to Materials' Insight by Machine Learning, Ceriotti Michele (EPFL, Switzerland)

Co-Authors: Alexandre Tkatchenko (Fritz Haber Institute Berlin, Germany), James Kermode (Warwick, United Kingdom)

The rise of high-throughput computational materials design promises to revolutionize the process of discovery of new materials, and tailoring of their properties. At the same time, by generating the structures of hundreds of thousands of hypothetical compounds, the issue of automated processing of large amounts of materials' data has been made very urgent - to identify structure-property relations, rationalize intuitively the behaviour of materials of increasing complexity, and re-use existing information to accelerate the prediction of properties and accelerate the search of materials' space. To address this challenge, a strongly interdisciplinary effort has developed, uniting forces among researchers in applied mathematics, computer science, chemistry and materials science, that aims at adapting machine-learning techniques to the specific problems that are encountered when working with materials. This minisymposium will showcase the most recent developments in this field, and provide a forum for some of the leading figures to discuss the most pressing challenges and the most promising directions. The participants will be selected to represent the many disciplines that are contributing to this endeavour and will cover the following topics: the representation of materials' structures and properties in a synthetic form that is best suited for automated processing, learning of the structure-property relations and circumventing the large computational cost of high-end electronic structure calculations, the identification of outliers and the automatic assessment of the reliability of input data, demonstrative applications to important materials science problems.
Minami Kazuo Paper
Wednesday, June 8, 2016
Auditorium C, 14:00-14:30

Paper

Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5, Kazuo Minami (RIKEN AICS, Japan)

Co-Authors: Masaaki Terai (RIKEN / Advanced Institute for Computational Science, Japan); Ryuji Yoshida (RIKEN / Advanced Institute for Computational Science, Japan); Shin-ichi Iga (RIKEN / Advanced Institute for Computational Science, Japan); Kazuo Minami (RIKEN / Advanced Institute for Computational Science, Japan); Hirofumi Tomita (RIKEN / Advanced Institute for Computational Science, Japan)

We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
Mira Antonietta Poster

Poster

EMD-03 Parallel MCMC for Estimating Exponential Random Graph Models, Antonietta Mira (Università della Svizzera italiana, Switzerland)

Co-Authors: Alex Stivala (University of Melbourne, Australia); Antonietta Mira (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Garry Robins (University of Melbourne, Australia); Alessandro Lomi (Università della Svizzera italiana, Switzerland)

As information and communication technologies continue to expand, the need arises to develop analytical strategies capable of accommodating new and larger sets of social network data. Considerable attention has recently been dedicated to the possibility of scaling exponential random graph models (ERGMs) - a well-established family of statistical models - for analyzing large social networks. Efficient computational methods would be highly desirable in order to extend the empirical scope of ERGM for the analysis of large social networks. We report preliminary results of a research project on the development of new sampling methods for ERGMs. We propose a new MCMC sampler and use it with Metropolis coupled Markov chain Monte Carlo, a typical scheme for MCMC parallelization. We show that, using this method, the CPU time for parameter estimation may be considerably reduced. *Generous support from the Swiss National Platform of Advanced Scientific Computing (PASC) is gratefully acknowledged.
Mitchell Lawrence MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, Lawrence Mitchell (Imperial College London, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Mogensen Kristian MS Presentation
Wednesday, June 8, 2016
Garden 3B, 16:30-17:00

MS Presentation

Scalability and Performance of the NEMOVAR Variational Ocean Data Assimilation Software, Kristian Mogensen (ECMWF, United Kingdom)

Co-Authors: Anthony Weaver (CERFACS, France); Magdalena Balmaseda (ECMWF, United Kingdom); Kristian Mogensen (ECMWF, United Kingdom)

Scalability and performance of the variational data assimilation software NEMOVAR for the NEMO ocean model is presented. NEMOVAR is a key component of the ECMWF operational Ocean analysis System 4 (Ocean S4) and future System 5 (Ocean S5). It is designed as a four dimensional variational assimilation (4D-Var) algorithm, which can also support three-dimensional (3D-Var) assimilation, using the First-Guess at Appropriate Time (FGAT) approach. Central to the code performance is the implementation of the correlation operator used for modelling of the background error covariance matrix. In NEMOVAR it is achieved using a diffusion operator. A new implicit formulation of the diffusion operator has been introduced recently which solves the underlying linear system using the Chebyshev iteration. The technique is more flexible and better suited for massively parallel machines than the method currently used operationally at ECMWF, but further improvements will be necessary for the future high-resolution applications.
Mohr Marcus Poster

Poster

CSM-02 A Novel Approach for Efficient Stencil Assembly in Curved Geometries, Marcus Mohr (Dept. of Earth and Environmental Sciences, Ludwig-Maximilians-Universität Münche, Germany)

Co-Authors: Marcus Mohr (Ludwig Maximilian University of Munich, Germany); Ulrich Rüde (University of Erlangen-Nuremberg, Germany); Markus Wittmann (FAU Erlangen-Nürnberg / Erlangen Regional Computing Center (RRZE), Germany); Barbara Wohlmuth (Technical University of Munich, Germany)

In many scientific and engineering applications one has to deal with curved geometries. Such domains can accurately be approximated e.g., by unstructured grids and iso-parametric finite elements. We present a novel approach here that is well-suited to our concept of hierarchical hybrid grids (HHG). The latter was shown to achieve excellent performance and scalability even for extreme numbers of DOFs by a matrix-free implementation and exploiting regularity of access patterns. In our approach FE stencils are not assembled exactly, but approximated by low order polynomials and evaluated with an efficient incremental algorithm. We demonstrate the accuracy achieved as well as the computational efficiency using our prototypical HHG-based mantle convection solver which operates on non-nested triangulations of a thick spherical shell. The implementation of our scheme is based on a systematic node-level performance analysis and maintains the high efficiency of the original HHG.
Mohr Stefan MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 14:20-14:40

MS Presentation

BigDFT: Flexible DFT Approach to Large Systems Using Adaptive and Localized Basis Functions, Stefan Mohr (BSC, Spain)

Co-Authors: Luigi Genovese (CEA/INAC, France); Stefan Mohr (BSC, Spain); Laura Ratcliff (Argonne National Laboratory, United States of America); Stefan Goedecker (University of Basel, Switzerland)

Since 2008, the BigDFT project consortium has developed an ab initio DFT code based on Daubechies wavelets. In recent articles, we presented the linear scaling version of BigDFT code[1], where a minimal set of localized support functions is optimised in situ for systems in various boundary conditions. We will present how the flexibility of this approach is helpful in providing a basis set that is optimally tuned to the chemical environment surrounding each atom. In addition than providing a basis useful to project Kohn-Sham orbitals informations like atomic charges and partial density of states, it can also be reused as-is, without re-optimisation, for charge-constrained DFT calculations within a fragment approach[2]. We will demonstrate the interest of this approach to express highly precise and efficient calculations of systems in complex environments[3]. [1] JCP 140, 204110 (2014), PCCP 17, 31360 (2015) [2] JCP 142, 23, 234105 (2015) [3] JCTC 11, 2077 (2015)
Poster

Poster

CSM-02 A Novel Approach for Efficient Stencil Assembly in Curved Geometries, Stefan Mohr (BSC, Spain)

Co-Authors: Marcus Mohr (Ludwig Maximilian University of Munich, Germany); Ulrich Rüde (University of Erlangen-Nuremberg, Germany); Markus Wittmann (FAU Erlangen-Nürnberg / Erlangen Regional Computing Center (RRZE), Germany); Barbara Wohlmuth (Technical University of Munich, Germany)

In many scientific and engineering applications one has to deal with curved geometries. Such domains can accurately be approximated e.g., by unstructured grids and iso-parametric finite elements. We present a novel approach here that is well-suited to our concept of hierarchical hybrid grids (HHG). The latter was shown to achieve excellent performance and scalability even for extreme numbers of DOFs by a matrix-free implementation and exploiting regularity of access patterns. In our approach FE stencils are not assembled exactly, but approximated by low order polynomials and evaluated with an efficient incremental algorithm. We demonstrate the accuracy achieved as well as the computational efficiency using our prototypical HHG-based mantle convection solver which operates on non-nested triangulations of a thick spherical shell. The implementation of our scheme is based on a systematic node-level performance analysis and maintains the high efficiency of the original HHG.
Molinari Jean-François MS Presentation
Friday, June 10, 2016
Garden 2A, 09:30-09:55

MS Presentation

Concurrent Coupling of Particles with a Continuum for Dynamical Motion of Solids, Jean-François Molinari (EPFL, Switzerland)

Co-Authors: J. F. Molinari (EPFL, Switzerland); Till Junge (Karlsruhe Institute of Technology, Germany); Jaehyun Cho (EPFL, Switzerland)

There are many situations where the discrete nature of matter needs to be accounted by numerical models. For instance with crystalline materials, friction and ductile fracture modelling can benefit from Molecular Dynamics formalism. However, capturing these processes needs sizes involving large number of particles, often becoming out of reach of modern computers. Thus, concurrent multiscale approaches have emerged to reduce the computational cost by using a coarser continuum model. The difference between particles and continuum leads to several challenging problems. In this presentation, finite temperatures, numerical stability and dislocation passing will be addressed. Also the software framework LibMultiScale will be presented with its associated parallel computation design choices.
Montecinos Gino I. MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:00-13:30

MS Presentation

Junction-Generalized Riemann Problem for Stiff Hyperbolic Balance Laws in Networks of Blood Vessels, Gino I. Montecinos (Universidad de Chile, Chile)

Co-Authors: Eleuterio F. Toro (University of Trento, Italy); Gino I. Montecinos (Universidad de Chile, Chile); Raul Borsche (Technische Universität Kaiserslautern, Germany); Jochen Kall (Technische Universität Kaiserslautern, Germany)

We design a new implicit solver for the Junction-Generalized Riemann Problem (J-GRP), which is based on a recently proposed implicit method for solving the Generalized Riemann Problem (GRP) for systems of hyperbolic balance laws. We use the new J-GRP solver to construct an ADER scheme that is globally explicit, locally implicit and with no theoretical accuracy barrier, in both space and time. The resulting ADER scheme is able to deal with stiff source terms and can be applied to non-linear systems of hyperbolic balance laws in domains consisting on networks of one-dimensional sub-domains. Here we specifically apply the numerical techniques to networks of blood vessels. An application to a physical test problem consisting of a network of 37 compliant silicon tubes (arteries) and 21 junctions, reveals that it is imperative to use high-order methods at junctions, in order to preserve the desired high-order of accuracy in the full computational domain.
Morales Jorge Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Jorge Morales (Ecole Polytechnique Fédérale de Lausanne (EPFL), Swiss Plasma Center (SPC), CH-1, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Mostofi Arash MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 14:40-15:00

MS Presentation

Sizing Up Linear-Scaling DFT: Recent Applications of ONETEP to Carbon Nanostructures, Arash Mostofi (Imperial College London, United Kingdom)

Co-Authors:

Over the past twenty years, electronic structure calculations based on density-functional theory (DFT) have revolutionized the way in which materials are studied. Advances in computer power have clearly played a major role, but as important has been the development of new methods that (i) enhance the scale and the scope of such calculations and (ii) keep up with current trends in high-performance computing hardware. In this talk, I will outline some aspects of the development of the ONETEP [1] linear-scaling DFT code that enables accurate calculations on tens of thousands of atoms on modern architectures. Such simulations give rise to both opportunities and challenges, which I will try to highlight. I will then focus on a specific example of the application of ONETEP to a problem that is rather challenging for conventional cubic-scaling DFT due to the large system sizes involved, namely electron transport in carbon nanotube networks. [1] www.onetep.org
Mounet Nicolas MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:00-15:20

MS Presentation

High-Throughput Prediction of Novel Two-Dimensional Materials, Nicolas Mounet (EPFL, Switzerland)

Co-Authors: Philippe Schwaller (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Ivano E. Castelli (EPFL, Switzerland); Marco Gibertini (EPFL, Switzerland); Giovanni Pizzi (EPFL, Switzerland); Nicola Marzari (EPFL, Switzerland)

As a crucial step towards the identification of novel and promising 2D materials, we provide here a large scale first-principles exploration and characterization of such compounds. From a combination of 480,000 structures harvested from the ICSD and COD databases, three-dimensional crystals are screened systematically by checking the absence of chemical bonds between adjacent layers, identifying more than 6,000 layered systems. Then DFT calculations of the van der Waals interlayer bonding are performed with automatic workflows, while systematically assessing the metallic, insulating or magnetic character of the materials obtained. Following full atomic and cell relaxations, phonon dispersions are computed as a first step towards the assessment of thermodynamic properties. Thanks to the AiiDA materials' informatics platform [1], and in particular its automatic workflow engine, database structure, sharing capabilities, and pipelines to/from crystallographic repositories, the systematic and reproducible calculation of these properties becomes straightforward, together with seamless accessibility and sharing. [1] http://aiida.net
Moureau Vincent MS Presentation
Wednesday, June 8, 2016
Garden 2A, 13:30-14:00

MS Presentation

High-Performance Computing for Large-Scale Unsteady Simulations of Turbulent Reacting Multi-Phase Flows: Challenges and Perspectives, Vincent Moureau (CORIA, CNRS UMR6614, France)

Co-Authors: Ghislain Lartigue (CORIA, CNRS UMR6614, France)

The prediction of conversion efficiency and pollutant emissions in combustion devices is particularly challenging as they result from very complex interactions of turbulence, chemistry, and heat exchanges at very different space and time scales. In the recent years, Large-Eddy Simulation (LES) has proven to bring significant improvements in the prediction of reacting turbulent flows. The CORIA lab leads the development of the YALES2 solver, which is designed to model turbulent reactive two-phase flows with body-fitted unstructured meshes. It has been specifically tailored for dealing with very large meshes up to tens of billion cells and for solving efficiently the low-Mach number Navier-Stokes equations on massively parallel computers. The presentation will focus on the high-fidelity combustion LES and the analysis of the huge amount of data generated by these simulations. Numerical methods used to decouple the different time-scales and to optimise the mesh resolution will also be emphasized.
Mozdzynski George MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:30-14:45

MS Presentation

Towards Exascale Computing with the ECMWF Model, George Mozdzynski (ECMWF, United Kingdom)

Co-Authors: Nils Wedi (ECMWF, United Kingdom); George Mozdzynski (ECMWF, United Kingdom); Sami Saarinen (ECMWF, United Kingdom)

The European Centre for Medium-Range Weather Forecasts (ECMWF) is currently investing in a scalability programme that addresses computing and data handling challenges for realizing those scientific advances on future high-performance computing environments that will enhance predictive skill from medium to monthly time scales. A key component of this programme is the European Commission funded project Energy efficient SCalable Algorithms for weather Prediction at Exascale (ESCAPE) that develops numerical building blocks and compute intensive algorithms of the forecast model, applies compute/energy efficiency diagnostics, designs implementations on novel architectures, and performs testing in operational configurations. The talk will report on the progress of the scalability programme with a special focus on ESCAPE.
Muller Eilif B. MS Presentation
Friday, June 10, 2016
Garden 2BC, 10:00-10:20

MS Presentation

Reconstruction and Simulation of Neocortical Microcircuitry, Eilif B. Muller (Blue Brain Project, EPFL, Switzerland)

Co-Authors:

It has been called "the most complete simulation of a piece of excitable brain matter to date", by Christof Koch, President and CSO of the Allen Institute for Brain Science in Seattle [1]. After briefly reviewing the HPC-based data integration and reconstruction workflow, I will focus on simulation results obtained using the digital reconstruction of the microcircuitry of somatosensory cortex of P14 rat, running on the "BlueBrain IV" IBM BlueGene/Q system hosted at the Swiss National Supercomputing Center (CSCS). We validated and explored the spontaneous and sensory evoked dynamics of the microcircuit, and discovered previously unknown biological mechanisms by integrating decades of anatomical and physiological Neuroscience data [2]. [1] Koch, C. and Buice, M. A Biological Imitation Game. Cell 163:2, 277-280 (2015). [2] Markram, H., Muller, E., Ramaswamy, S., Reimann, M. et al. Reconstruction and Simulation of Neocortical Microcircuitry. Cell 163:2, 456-495 (2015).
Murai Hitoshi MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:00-14:15

MS Presentation

Omni Compiler and XcodeML: An Infrastructure for Source-to-Source Transformation, Hitoshi Murai (AICS, RIKEN, Japan)

Co-Authors: Hitoshi Murai (AICS, RIKEN, Japan); Masahiro Nakao (AICS, RIKEN, Japan); Hidetoshi Iwashita (AICS, RIKEN, Japan); Jinpil Lee (AICS, RIKEN, Japan); Akihiro Tabuchi (University of Tsukuba, Japan)

We have been developing a compiler of PGAS programming language called XcalableMP for post-petascale computing. XcalableMP is a directive-based language extension of Fortran95 and C for scientific programming for high-performance distributed memory parallel systems. Omni Compiler is an infrastructure for source-to-source transformation to design source-to-source compilers such as Omni XcalableMP compiler. It includes C and Fortran95 front-ends which translate a source code to XML-based intermediate code called XcodeML, a Java-based code-transformation library on XcodeML, and the de-compilers which translate XcodeML intermediate code back to transformed source code. Currently, the Omni compiler also supports the code transformation for OpenMP and OpenACC. In this talk, we will present internals of Omni compiler by taking Omni XcalableMP compiler as a case study, and our future plan.
Musci Mirto MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 16:30-17:00

MS Presentation

Configuration, Profiling and Tuning of a Complex Biomedical Application: Analysis of ISA Extensions for Floating Point Processing, Mirto Musci (University of Pavia, Italy)

Co-Authors: Mirto Musci (University of Pavia, Italy)

This contribution describes an instance of the practitioner approach to application tuning for HPC deploy, starting on a small-size server and applying a black-box approach to performance tuning. The bio-medical iCardioCloud project aims at establishing a computational framework to perform a complete patient-specific numerical analysis specially oriented to aortic diseases. The application is based on CFD. The work here reported shows the SE basic approach to tuning within a black-box approach: optimisation of the build process, analysis of module dependencies, assessment of possible compiler-based optimisations, final targeting on a specific architecture, generation of various scripts for optimised builds. The experience gained with this study case shows to which extent a skilled IT professional is actually required in a domain specific application to move the process from a naive to an advanced solution.
Müller Andres MS Presentation
Thursday, June 9, 2016
Garden 3A, 11:50-12:10

MS Presentation

URANS Computations of an Unstable Cavitating Vortex Rope, Andres Müller (EPFL, Switzerland)

Co-Authors: Andres Müller (EPFL, Switzerland); François Avellan (EPFL / LMH, Switzerland); Cécile Münch (HES-SO Valais-Wallis, Switzerland)

Due to the massive penetration of alternative renewable energies, hydraulic power plants are key energy conversion technologies to stabilize the electrical power network using hydraulic machines at off design operating conditions. For a flow rate larger than the one at the best efficient point a cavitating vortex rope occurs, leading to strong pressure surges in the entire hydraulic system. To better understand the mechanisms responsible for the pressure surges, URANS simulations of a reduced scale Francis turbine are performed. Several sigma values are investigated corresponding to stable and unstable cavitating vortex ropes. The results are compared with the experimental measurements. The main challenge of the computations is the long physical time, compared to the time step, required to capture the beginning of the instability.
Münch Cécile MS Presentation
Thursday, June 9, 2016
Garden 3A, 11:50-12:10

MS Presentation

URANS Computations of an Unstable Cavitating Vortex Rope, Cécile Münch (HES-SO Valais//Wallis, Switzerland)

Co-Authors: Andres Müller (EPFL, Switzerland); François Avellan (EPFL / LMH, Switzerland); Cécile Münch (HES-SO Valais-Wallis, Switzerland)

Due to the massive penetration of alternative renewable energies, hydraulic power plants are key energy conversion technologies to stabilize the electrical power network using hydraulic machines at off design operating conditions. For a flow rate larger than the one at the best efficient point a cavitating vortex rope occurs, leading to strong pressure surges in the entire hydraulic system. To better understand the mechanisms responsible for the pressure surges, URANS simulations of a reduced scale Francis turbine are performed. Several sigma values are investigated corresponding to stable and unstable cavitating vortex ropes. The results are compared with the experimental measurements. The main challenge of the computations is the long physical time, compared to the time step, required to capture the beginning of the instability.

N

Nakao Masahiro MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:00-14:15

MS Presentation

Omni Compiler and XcodeML: An Infrastructure for Source-to-Source Transformation, Masahiro Nakao (AICS, RAIKEN, Japan)

Co-Authors: Hitoshi Murai (AICS, RIKEN, Japan); Masahiro Nakao (AICS, RIKEN, Japan); Hidetoshi Iwashita (AICS, RIKEN, Japan); Jinpil Lee (AICS, RIKEN, Japan); Akihiro Tabuchi (University of Tsukuba, Japan)

We have been developing a compiler of PGAS programming language called XcalableMP for post-petascale computing. XcalableMP is a directive-based language extension of Fortran95 and C for scientific programming for high-performance distributed memory parallel systems. Omni Compiler is an infrastructure for source-to-source transformation to design source-to-source compilers such as Omni XcalableMP compiler. It includes C and Fortran95 front-ends which translate a source code to XML-based intermediate code called XcodeML, a Java-based code-transformation library on XcodeML, and the de-compilers which translate XcodeML intermediate code back to transformed source code. Currently, the Omni compiler also supports the code transformation for OpenMP and OpenACC. In this talk, we will present internals of Omni compiler by taking Omni XcalableMP compiler as a case study, and our future plan.
Nestola Maria Giuseppina Chiara MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:00-14:15

MS Presentation

FD/FEM Coupling with the Immersed Boundary Method for the Simulation of Aortic Heart Valves, Maria Giuseppina Chiara Nestola (ICS, Università della Svizzera italiana, Switzerland)

Co-Authors: Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Hadi Zolfaghari (University of Bern, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

The ever-increasing available computational power allows for solving more complex physical problems spanning multiple physical domains. We present a numerical tool for simulating fluid-structure interaction between blood flow and the soft tissue of heart valves. Using the basic concept of the Immersed Boundary Method, the interaction between the two physical domains (flow and structure) does not require mesh manipulation. We solve the governing equations of the fluid and the structure with domain-specific finite difference and finite element discretisations, respectively. We use a massively parallel algorithmic framework for handling the L2-projection transfer between the loosely coupled highly parallel solvers for fluid and solid. Our tool builds on a well-established and proven Navier-Stokes solver and a novel method for solving non-linear continuum solid mechanics.
Poster

Poster

LS-03 GPU-Accelerated Immersed Boundary Method with CUDA for the Efficient Simulation of Biomedical Fluid-Structure Interaction, Maria Giuseppina Chiara Nestola (ICS, Università della Svizzera italiana, Switzerland)

Co-Authors: Barna Errol Mario Becsek (University of Bern, Switzerland); Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

Immersed boundary methods have become the most usable and useful tools for simulation of biomedical fluid-structure interaction, e.g., in the aortic valve of human heart. In such problems, complex geometry and motion of the soft tissue impose significant computational cost for bodyfitted-mesh methods. Resorting to a fixed Eulerian grid for the flow simulation along with the immersed boundary method to model the interaction with the soft tissue eliminates the expensive mesh generation and updating costs. Nevertheless, the computational cost for the geometry operations including adaptive search algorithms are still significant. Herein, we implemented the immersed boundary kernels with CUDA to be transferred and executed on thousands of parallel threads on the general purpose GPU. Host-device memory optimisation along with optimal usage of GPU multiprocessors results in a boosted performance in fluid-structure interaction simulations.
Nielsen Allan Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 10:30-10:50

Contributed Talk

Space-Time Parallelism for Hyperbolic PDEs, Allan Nielsen (EPFL, Switzerland)

Co-Authors: Gilles Brunner (EPFL, Switzerland); Jan Hesthaven (EPFL, Switzerland)

Parallel-in-time integration techniques have been hailed as a potential paths to exascale for the solution of evolution type problems. Methods of time-parallel integration are intended to extend parallel scaling on compute clusters beyond what is possible using conventional domain decomposition techniques alone. In this talk we give a short introduction to space-time parallelism with emphasis on the parareal method. We then proceed to present resent advances in the construction of the coarse operator needed in the iterative correction scheme. The modifications allow for parallel-in-time acceleration of purely hyperbolic systems of partial differential equations, something previously widely considered impractical. The talk is concluded with a presentation of preliminary results on parallel-in-time integration of a two-dimensional shallow-water-wave equation that governs the underlying dynamics in a tsunami simulation application.
Novati Guido Contributed Talk
Thursday, June 9, 2016
Garden 3A, 10:50-11:10

Contributed Talk

Propulsive Advantage of Swimming in Unsteady Flows, Guido Novati (ETH Zurich, Switzerland)

Co-Authors: Siddhartha Verma (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

Individual fish swimming in a school encounter vortices generated by the propulsion of upstream members. Experimental and theoretical studies suggest that these hydrodynamic interactions may increase thrust without additional energy expenditure. However, difficulties associated with experimental studies have prevented a systematic quantification of this phenomenon. Using simulations of self-propelled swimmers, we investigate some of the mechanisms by which fish may exploit each others' wake to reduce energy expenditure. We quantify the relative importance of two mechanisms for increasing swimming efficiency: the decrease in relative velocity induced by proximity to wake vortices; and wall/"channelling" effects. Additionally, we conduct simulations of fish swimming in the Karman vortex street behind a static cylinder. This configuration helps us clarify the role of the bow pressure wave, entrainment, and "vortex-surfing" in enhancing propulsive efficiency of trout swimming near obstacles.

O

Obrist Dominik MS Summary

MS Summary

MS01 Advanced Computational Methods for Applications to the Cardiovascular System I, Dominik Obrist (University of Bern, Switzerland)

Co-Authors: Dominik Obrist (University of Bern, Switzerland), Christian Vergara (Politecnico di Milano, Italy)

Cardiac and Cardiovascular Mathematics represents nowadays a challenging topic in view of the emerging and growing collaborations between clinicians and mathematicians. In this respect, the numerical solution of problems arising in modelling cardiac and systemic phenomena opens new and interesting perspectives which need to be properly addressed. From the cardiac side, a fully integrated heart model represents a complex multiphysics problem, which is in turn composed of several submodels describing cardiac electrophysiology, mechanics, and fluid dynamics. On the system circulation side, although it has been studied for a longer time, several mathematical and numerical aspects still need to be addressed (e.g., tissue remodelling, atherosclerotic plaque formation, aneurysms development, transitional and turbulence phenomena in blood flows). This minisymposium aims at gathering researchers and experts in computational and numerical modelling of the heart and the systemic circulation.
MS Summary

MS Summary

MS07 Advanced Computational Methods for Applications to the Cardiovascular System II, Dominik Obrist (University of Bern, Switzerland)

Co-Authors: Dominik Obrist (University of Bern, Switzerland), Christian Vergara (Politecnico di Milano, Italy)

Cardiac and Cardiovascular Mathematics represents nowadays a challenging topic in view of the emerging and growing collaborations between clinicians and mathematicians. In this respect, the numerical solution of problems arising in modelling cardiac and systemic phenomena opens new and interesting perspectives which need to be properly addressed. From the cardiac side, a fully integrated heart model represents a complex multiphysics problem, which is in turn composed of several submodels describing cardiac electrophysiology, mechanics, and fluid dynamics. On the system circulation side, although it has been studied for a longer time, several mathematical and numerical aspects still need to be addressed, as e.g. tissue remodelling, atherosclerotic plaque formation, aneurysms development, transitional and turbulence phenomena in blood flows. This minisymposium aims at gathering researchers and experts in computational and numerical modelling of the heart and the systemic circulation.
MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:00-14:15

MS Presentation

FD/FEM Coupling with the Immersed Boundary Method for the Simulation of Aortic Heart Valves, Dominik Obrist (University of Bern, Switzerland)

Co-Authors: Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Hadi Zolfaghari (University of Bern, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

The ever-increasing available computational power allows for solving more complex physical problems spanning multiple physical domains. We present a numerical tool for simulating fluid-structure interaction between blood flow and the soft tissue of heart valves. Using the basic concept of the Immersed Boundary Method, the interaction between the two physical domains (flow and structure) does not require mesh manipulation. We solve the governing equations of the fluid and the structure with domain-specific finite difference and finite element discretisations, respectively. We use a massively parallel algorithmic framework for handling the L2-projection transfer between the loosely coupled highly parallel solvers for fluid and solid. Our tool builds on a well-established and proven Navier-Stokes solver and a novel method for solving non-linear continuum solid mechanics.
Poster

Poster

LS-03 GPU-Accelerated Immersed Boundary Method with CUDA for the Efficient Simulation of Biomedical Fluid-Structure Interaction, Dominik Obrist (University of Bern, Switzerland)

Co-Authors: Barna Errol Mario Becsek (University of Bern, Switzerland); Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

Immersed boundary methods have become the most usable and useful tools for simulation of biomedical fluid-structure interaction, e.g., in the aortic valve of human heart. In such problems, complex geometry and motion of the soft tissue impose significant computational cost for bodyfitted-mesh methods. Resorting to a fixed Eulerian grid for the flow simulation along with the immersed boundary method to model the interaction with the soft tissue eliminates the expensive mesh generation and updating costs. Nevertheless, the computational cost for the geometry operations including adaptive search algorithms are still significant. Herein, we implemented the immersed boundary kernels with CUDA to be transferred and executed on thousands of parallel threads on the general purpose GPU. Host-device memory optimisation along with optimal usage of GPU multiprocessors results in a boosted performance in fluid-structure interaction simulations.
Oger Guillaume Poster

Poster

CSM-11 Porting SPH-Flow to GPUs Using OpenACC: Experience and Challenges, Guillaume Oger (Ecole Centrale Nantes, France)

Co-Authors: Guillaume Jeusel (Nextflow Software, France); Jean-Guillaume Piccinali (ETH Zurich / CSCS, Switzerland); Guillaume Oger (École centrale de Nantes, France)

SPH-flow is the one of the most advanced SPH solvers dedicated to high dynamic multiphase physics simulations. Over the last year, ECNantes has partnered with Nextflow-Software to deliver an accelerated version of the code on Piz Daint. This poster will present the results of this development activity. After assessing the overall performance of the code, we focused on the Monaghan solver. We investigated strategies to improve its performance for an efficient execution on CPUs as well as GPUs, maintaining the scalability of the MPI version and high programmability. The keys to our incremental successes were being able to run a reduced version of the code, data types refactoring and workarounds for the limitations of the compilers. This work should be of interest to academic developers because it details our experience using OpenACC directives for scientific computing in an area of cutting edge research.
Ohana Noé Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Noé Ohana (Swiss Plasma Center, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Noé Ohana (Swiss Plasma Center, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Okoniewski Michal MS Presentation
Thursday, June 9, 2016
Garden 3A, 15:00-15:15

MS Presentation

Enhancing the Computational Capabilities for Biologists: Genomic Data Analysis Services at ETH Zurich, Michal Okoniewski (ETH Zurich, Switzerland)

Co-Authors: Thomas Wüst (ETH Zurich, Switzerland); Bernd Rinn (ETH Zurich, Switzerland)

Genomics-based biological research (e.g., next-generation sequencing) generates increasingly amounts of data, which need dedicated high-performance computing (HPC) resources to be analysed efficiently. However, the specialization in both areas (namely, genomics and HPC) makes it increasingly challenging to bring the two fields together and to leverage the usage of available computational resources by biologists. The mission of Scientific IT Services (SIS) of ETH Zurich is to bridge this gap and to provide client-tailored solutions for big data genomics. In this presentation, we illustrate this need and our approach by selected examples ranging from the design of automated, high-throughput NGS analysis workflows through addressing the biology "software stack jungle" to scientific IT education for biologists. Throughout the talk, we emphasize the importance of scientific data management, consulting needs and community building for using HPC in biological research.
Omlin Samuel Poster

Poster

EAR-05 Optimal Utilisation of Piz Daint Memory Bandwidth for a Parallel GPU Two-Phase Solver, Samuel Omlin (University of Lausanne, Switzerland)

Co-Authors: Samuel Omlin (University of Lausanne, Switzerland); Yury Podladchikov (University of Lausanne, Switzerland)

Massively parallel algorithms are commonly based on iterative methods. Such algorithms may run with optimal performance on very large systems, such the Cray XC30 Piz Daint at CSCS. The iterative methods require typically few calculations per transferred floating point number (low flop to byte ratio), and are therefore normally memory throughput bound. We developed a parallel GPU Two-Phase Hydro-Mechanical solver to resolve nonlinear fluid flow in transforming porous media. The key point for maximum performance is to optimally use the hardware's memory bandwidth when performing the minimum amount of memory accesses needed to solve the coupled equations. We show promising results of single GPU memory throughput close to memory copy values (i.e. when no computation is done), and optimal parallel efficiency and a linear scaling up to 5000 GPU nodes, i.e. the entire Piz Daint machine.
Orozco Modesto MS Presentation
Friday, June 10, 2016
Garden 3A, 10:15-10:45

MS Presentation

Exploring Protein Dynamics, Modesto Orozco (Institut for Research in Biomedicine, Spain)

Co-Authors:

Proteins are molecular machines, whose activity is linked to its ability to suffer conformational transitions as a consequence of external effectors, like changes in the environment, presence of drugs, or other macromolecules. Unfortunately, the representation of protein flexibility is complex as it is too fast for most experimental techniques, and too slow for simulation methods. I will summarize in my talks efforts done at Barcelona to develop a theoretical framework to provide a holistic picture of protein flexibility and dynamics.
Ostilla-Monico Rodolfo MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Rodolfo Ostilla-Monico (Harvard University, United States of America)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Osuna Carlos Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 12:10-12:30

Contributed Talk

The GridTools Libraries for the Solution of PDEs Using Stencils, Carlos Osuna (MeteoSwiss, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

Numerical weather prediction and climate models like COSMO and ICON solve explicitly a large set of PDEs. The STELLA library was successfully used to port the dynamical core of COSMO providing a performance portable code across multiple platforms. A significant performance speedup was obtained for NVIDIA GPUs as reported in doi>10.1145/2807591.2807676. However its applicability was restricted to only cartesian structured grids, finite difference methods, and is difficult to be used outside the COSMO model. The GridTools project emerges as an effort to provide an ecosystem for developing portable and efficient grid applications for the explicit solution of PDEs. GridTools generalizes STELLA to a wider class of weather and climate models on multiple grids: Cartesian and spherical, and offers facilities for performing communication and setting boundary conditions. Here we present the GridTools API and show performance on NVIDIA GPUs and x86 platforms.
Ozmen Neslihan Poster

Poster

EAR-03 Imaging Subsurface Fluid via Poroelastic Theory and Adjoint Tomography, Neslihan Ozmen (Utrecht University, Netherlands)

Co-Authors: Jeannot Trampert (Utrecht University, Netherlands)

Poroelastic theory is essential in many geophysical applications such as imaging fluid-flow in oil reservoirs, monitoring the storage of CO2, and most topics in hydrogeology etc. Biot formulated the poroelastic wave equation in fully-saturated media. Based on Biot's theory, we aim to image the fluid directly in a regional seismic exploration setting. We simulate wave propagation in poroelastic media using a spectral element method. We define several misfit functionals and use adjoint methods to calculate the corresponding sensitivity kernels. The adjoint method is an efficient way for computing the gradient of a misfit functional with respect to the model parameters, and is based on the interaction between the time-reversed regular and adjoint field. Using those kernels, we perform gradient-based iterative inversions. We investigate in how far a poroelastic theory is effective for imaging the fluids, and study the influence of bulk properties, porosity, permeability and fluid-solid interaction on the results.

P

Pabst Hans Poster

Poster

MAT-04 CP2K within the PASC Materials Network, Hans Pabst (Intel Semiconductor AG, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Hans Pabst (Intel Semiconductor AG, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

One of the goals of the PASC project is to strengthen the networking in the Swiss material science community through active development of collaborative relationships among University researchers and CSCS staff. This includes assisting researchers in tuning, debugging, optimizing, and enhancing codes and applications for HPC resources, from mid-scale to national and international petascale facilities, with a view to the exascale transition. In addition, the application support specialists provide support for development projects on software porting techniques, parallelization and optimization strategies, deployment on diverse computational platforms, and data management. Here we present selected tools and software developed for CP2K [1]. Furthermore we show exemplary how a CP2K application can be tuned to optimally use all available HPC resources. With a view to the next-generation HPC hardware, we present first promising performance results for INTEL's Broadwell-EP and KNL platform. [1] The CP2K developers group, CP2K is freely available from: https://www.cp2k.org/, 2016
Pakrouski Kiryl Poster

Poster

PHY-02 Fractional Quantum Hall Effect and Topological Quantum Computation, Kiryl Pakrouski (ETHZ, Switzerland)

Co-Authors:

We use high performance exact diagonalization to uncover the nature of the fractional quantum Hall states at fillings 5/2 and 12/5 under realistic experimental conditions. For both states we find the parameter regimes where they possess excitations with non-Abelian braiding statistics that can be used for topological quantum computation.
Palacios Juan Poster

Poster

LS-04 How to Synthesize Neurons? A Case Study on Best Practices in Scientific Software Development, Juan Palacios (EPFL, Switzerland)

Co-Authors: Juan Palacios (EPFL, Switzerland)

The Blue Brain Project (BBP) and the Human Brain Project aim to improve our understanding of the human brain through data-driven modelling and whole brain simulation. Advanced computing technologies enable such projects to study problems that were unmanageable until recently. Care must be taken in the development process to ensure that the software developed is usable and maintainable over the project lifetime. Best practices range from documentation, code review, continuous integration, test coverage, to functional and integration testing. While common in industrial environments, these best practices are not yet common in more academic scientific projects. Within the BBP, we are developing scientific software to synthesize biologically realistic neuronal morphologies for large-scale brain simulations. We discuss how best practices are applied in different stages of our scientific software development process. We show how validation drives the development of increasingly complex models and demonstrate how modular design benefits future use through examples.
Palmer Tim N. MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:45-16:00

MS Presentation

The Use of Inexact Hardware to Improve Weather and Climate Predictions, Tim N. Palmer (University of Oxford, United Kingdom)

Co-Authors: Tim N. Palmer (Oxford University, United Kingdom)

In weather and climate models values of relevant physical parameters are often uncertain by more than 100%. Still, numerical operations are typically calculated in double precision with 15 significant decimal digits. If we reduce numerical precision, we can reduce power consumption and increase computational performance significantly. If savings in computing power are reinvested, this will allow an increase in resolution in weather and climate models and an improvement of weather and climate predictions. I will discuss approaches to reduce numerical precision beyond single precision in HPC and in weather and climate modelling. I will present results that show that precision can be reduced significantly in atmosphere models and that potential savings are huge. Finally, I will discuss how to reduce precision in weather and climate models most efficiently and how rounding errors will impact on model dynamics and predictability. I will also outline implications for data assimilation and data storage.
Papadimitriou Costas Paper
Wednesday, June 8, 2016
Auditorium C, 14:30-15:00

Paper

Approximate Bayesian Computation for Granular and Molecular Dynamics Simulations, Costas Papadimitriou (University of Thessaly, Greece)

Co-Authors: Panagiotis Angelikopoulos (ETH Zurich, Switzerland); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Costas Papadimitriou (University of Thessaly, Greece); Petros Koumoutsakos (ETH Zurich, Switzerland)

The effective integration of models with data through Bayesian uncertainty quantification hinges on the formulation of a suitable likelihood function. In many cases such a likelihood may not be readily available or it may be difficult to compute. The Approximate Bayesian Computation (ABC) proposes the formulation of a likelihood function through the comparison between low dimensional summary statistics of the model predictions and corresponding statistics on the data. In this work we report a computationally efficient approach to the Bayesian updating of Molecular Dynamics (MD) models through ABC using a variant of the Subset Simulation method. We demonstrate that ABC can also be used for Bayesian updating of models with an explicitly defined likelihood function, and compare ABC-SubSim implementation and efficiency with the transitional Markov chain Monte Carlo (TMCMC). ABC-SubSim is then used in force-field identification of MD simulations. Furthermore, we examine the concept of relative entropy minimization for the calibration of force fields and exploit it within ABC. Using different approximate posterior formulations, we showcase that assuming Gaussian ensemble fluctuations of molecular systems quantities of interest can potentially lead to erroneous parameter identification.
Parkhill John MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 16:00-16:30

MS Presentation

Kinetic Energy Functionals from Convolutional Neural Networks, John Parkhill (The University of Notre Dame, United States of America)

Co-Authors:

We demonstrate a convolutional neural network trained to reproduce the Kohn-Sham kinetic energy of hydrocarbons from an input electron density. The output of the network is used as a nonlocal correction to conventional local and semi-local kinetic functionals. We show that this approximation qualitatively reproduces Kohn-Sham potential energy surfaces when used with conventional exchange correlation functionals. The density which minimizes the total energy given by the functional is examined in detail. We identify several avenues to improve on this exploratory work, by reducing numerical noise and changing the structure of our functional. Finally we examine the features in the density learned by the neural network to anticipate the prospects of generalizing these models.
Paruta Paola Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Paola Paruta (Ecole Polytechnique Fédérale de Lausanne (EPFL), Swiss Plasma Center (SPC), CH-1, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Pasadakis Dimosthenis Poster

Poster

CSM-07 Estimation of Drag and Lift Coefficients for Steady State Incompressible Flow of a Newtonian Fluid on Domains with Periodic Roughness, Dimosthenis Pasadakis (Institute of Computational Science, Universita della Svizzera italiana, Switzerland)

Co-Authors: Drosos Kourounis (Università della Svizzera italiana, Switzerland); Olaf Schenk (Università della Svizzera italiana, Switzerland)

Rough boundaries impose several challenges to fluid simulations. Their difficulty stems from the fact that resolving the small scale rough geometry requires significantly refined meshes in the vicinity of the boundaries. Since all physical rough boundaries have a characteristic length, scale corrections on the standard Navier-Stokes equations can be obtained by considering Taylor expansions around the rough surface, leading to modified boundary conditions. Numerical tests are presented to validate the proposed theory including the calculation of drag and lift coefficients for laminar flow around a cylinder with rough boundary. Key-words: steady state Navier-Stokes equations, periodic rough boundaries, drag and lift coefficients, laminar flow.
Pausch Richard MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, TU Dresden, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Pavia Fabio MS Presentation
Friday, June 10, 2016
Garden 2A, 09:55-10:20

MS Presentation

A Parallel Algorithm for Multiscale Atomistic/Continuum Simulations, Fabio Pavia (Ansys, Switzerland)

Co-Authors: W. Curtin (EPFL, Switzerland)

Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we discuss some of the available multiscale coupling algorithms and their most interesting features from an academic and a corporate research perspective. We then present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. The main purpose of this work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending the robust CADD-type displacement coupling to 3D. Our implementation allows us to reproduce results of extremely large atomistic simulations using fewer than 1,000,000 atoms, thus at a much lower computational cost.
Peng Ivy Bo MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:00-16:30

MS Presentation

Decoupling and Coupling in iPIC3D, a Particle-in-Cell Code for Exascale, Ivy Bo Peng (KTH, Sweden)

Co-Authors: Stefano Markidis (Royal Institute of Technology, Sweden); Erwin Laure (Royal Institute of Technology, Sweden); Yuxi Chen (University of Michigan, United States of America); Gabor Toth (University of Michigan, United States of America); Tamas Gombosi (University of Michigan, United States of America)

iPIC3D is a massively parallel three-dimensional implicit particle-in-cell code used for the study of the interactions between the solar wind and Earth's magnetosphere. Because the scalability potential and scientific impact demonstrated by iPIC3D, it has been selected in many European HPC projects to prepare for the future exascale machines. In this talk, we present new algorithmic changes to iPIC3D in preparation for the coming exascale era. In particular, we will present decoupled computation, communication and I/O operations in iPIC3D to address the challenges of irregular operations on large number of processes. Our evaluation results show that the performance benefits from this model increases as the scale of simulation increases. We also present a two-way coupled kinetic-fluid model with multiple implicit PIC domains (by the iPIC3D code) embedded in MHD (by the BATS-R-US code) under the Space Weather Modeling Framework (SWMF).
Petit Eric MS Presentation
Wednesday, June 8, 2016
Garden 2A, 15:30-16:00

MS Presentation

A Practical Approach to Efficient and Scalable Programming Strategies Exploration for Large HPC Application, Eric Petit (UVSQ, France)

Co-Authors:

With the shift in HPC system paradigm, applications have to be re-designed to efficiently address large numbers of cores. With the advancements in network technology and communication library's, new opportunities to explore advanced programming models and load-balancing runtime system in large HPC cluster have emerged. However, it requires deep code modification that cannot be practically applied at full-scale application. We propose proto-applications as proxy for code-modernization and we will demonstrate two specialized libraries for HPC workload load-balancing based on the GASPI communication library (PGAS) and task based programming. Our first example features unstructured mesh computation using task-based parallelisation. The second demonstrates load-balancing for combustion simulation. Both proto-applications are open-source and can serve to the development of genuine HPC application.
Pezzè Mauro MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 13:00-13:30

MS Presentation

Towards an Engineering Methodology for Multi-Model Scientific Simulations, Mauro Pezzè (USI Università della Svizzera italiana, Switzerland)

Co-Authors:

Studying complex physical phenomena requires integrating the heterogeneous computational models of the different subsystems to analyse the interactions between the aspects that characterize the phenomenon as a whole. While efficient methods are available to build and simulate single models, the problem of devising a general approach to integrate heterogeneous models has been studied only recently and is still an open issue. We propose an engineering methodology to automate the process of integrating heterogeneous computational models. The methodology is based on the novel idea of capturing the relevant information about the different models and their integration strategies by means of meta-data that can be used to automatically generate an efficient integration framework for the specific set of models and interactions. We discuss the various aspects of the integration problem, highlight the limits of the current solutions and characterize the novel methodology by means of a concrete case study.
Pezzuto Simone MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:30-14:00

MS Presentation

Accurate Estimation of 3D Ventricular Activation in Heart Failure Patients from Electroanatomic Mapping, Simone Pezzuto (Università della Svizzera italiana, Switzerland)

Co-Authors: Peter Kalavsky (Università della Svizzera italiana, Switzerland); Mark Potse (Università della Svizzera italiana, Switzerland); Angelo Auricchio (Fondazione Cardiocentro Ticino, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland)

Accurate characterization of the cardiac activation sequence can support diagnosis and personalized therapy in heart-failure patients. Current invasive mapping techniques provide limited coverage. Our aim is to estimate a complete volumetric activation map from a limited number of measurements. This is achieved by optimising the local conductivity and early activation sites to minimise the mismatch between simulated and measured activation times. We modeled the activation times using an eikonal equation, reducing computational cost by 3 orders of magnitude compared to more physiologically detailed methods. The model provided a sufficiently accurate approximation of the activation time and the ECG in our patients. The solver was implemented on GPUs. Since the fast-marching method is not suitable for this architecture, we used a simple Jacobi iteration of a local variational principle. On a single GPU, each forward simulation took less than 2 seconds, and the inverse problem was solved in a few minutes.
Pfefferlé David Contributed Talk
Thursday, June 9, 2016
Garden 3A, 11:10-11:30

Contributed Talk

Self-Consistent Modelling of Plasma Heating and Fast Ion Generation Using Ion-Cyclotron Range of Frequency Waves in 2D and 3D Devices, David Pfefferlé (EPFL, Switzerland)

Co-Authors: Wilfred Cooper (EPFL, Switzerland); Jonathan Graves (EPFL, Switzerland); David Pfefferlé (EPFL, Switzerland); Joachim Geiger (Max Planck Institute of Plasma Physics, Germany)

Ion-Cyclotron Range of Frequency (ICRF) waves is an efficient source of plasma heating in tokamaks and stellarators. In ICRF heated plasmas, the resonating particles phase-space distribution function displays significant distortion. A significant consequence is to modify noticeably the plasma properties which dictates the propagation of the ICRF wave. The self-consistent modelling tool SCENIC was built in order to solve this highly non-linear problem. It is one of the few ICRF modelling tools able to tackle both 2D and 3D plasma configurations. The computational resources, in particular the amount of shared memory required to resolve the plasma equilibrium and the wave propagation, significantly increases for simulations of strongly 3D equilibrium such as stellarators compared to 2D tokamaks calculation. We present some applications of SCENIC to tokamak and stellarator plasmas. Particular focus is given to simulations of the recenlty started Wendelstein7-X stellarator experiment which will use ICRF waves for fast particle generation.
Pflüger Dirk MS Presentation
Thursday, June 9, 2016
Garden 2BC, 12:00-12:30

MS Presentation

Fault Tolerance and Silent Fault Detection for Higher-Dimensional Discretizations, Dirk Pflüger (Universität Stuttgart, Germany)

Co-Authors: Alfredo Parra Hinojosa (Technische Universität München, Germany)

Future exascale systems are expected to have a mean time between failure in the range of minutes. Classical approaches such as checkpointing and then recomputing the missing solution will be therefore out of scope. Algorithm-based fault tolerance in contrast aims to continue without recomputations and with only minor extra computational effort. Therefore, numerical schemes have to be adapted. We present algorithm-based fault tolerance for the solution of high-dimensional PDEs. They exploit a hierarchical extrapolation scheme, the sparse grid combination technique. Using the hierarchical ansatz, we show how hard faults can be mitigated without checkpoint-restart. Furthermore we explain how even soft faults (for example due to silent data corruption) can often be detected and handled.
Philippe Olivier MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 15:30-16:00

MS Presentation

Scientific Software Engineering: The Role of Research Software Engineers, Olivier Philippe (Software Sustainability Institute, United Kingdom)

Co-Authors: Neil Chue Hong (Software Sustainability Institute, United Kingdom); Olivier Philippe (Software Sustainability Institute, United Kingdom)

Research across all disciplines is ever more reliant on software, which means that research groups are ever more reliant on the people who write software. Since there is no a career path for software developers in UK academia, these people are typically recruited into postdoctoral positions where they find themselves writing code, but having their career judged against research criteria. The result is an overlooked class of research group members who lack recognition and reward despite their significant contribution to research. The Software Sustainability Institute has been campaigning for "Research Software Engineers" (RSEs). Their campaign has gained greater recognition of RSEs not just within the UK, but across the world. It helped found the UKRSE Association (over 550 members), has supported the emergence of Research Software Groups, and campaigned for funding of RSE positions - such as the RSE Fellowship programme that was funded by the UK's EPSRC in 2015.
Phillips Everett MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Everett Phillips (NVIDIA, United States of America)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Piccinali Jean-Guillaume Poster

Poster

CSM-11 Porting SPH-Flow to GPUs Using OpenACC: Experience and Challenges, Jean-Guillaume Piccinali (CSCS, Switzerland)

Co-Authors: Guillaume Jeusel (Nextflow Software, France); Jean-Guillaume Piccinali (ETH Zurich / CSCS, Switzerland); Guillaume Oger (École centrale de Nantes, France)

SPH-flow is the one of the most advanced SPH solvers dedicated to high dynamic multiphase physics simulations. Over the last year, ECNantes has partnered with Nextflow-Software to deliver an accelerated version of the code on Piz Daint. This poster will present the results of this development activity. After assessing the overall performance of the code, we focused on the Monaghan solver. We investigated strategies to improve its performance for an efficient execution on CPUs as well as GPUs, maintaining the scalability of the MPI version and high programmability. The keys to our incremental successes were being able to run a reduced version of the code, data types refactoring and workarounds for the limitations of the compilers. This work should be of interest to academic developers because it details our experience using OpenACC directives for scientific computing in an area of cutting edge research.
Pintarelli Simon Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:50-12:10

Contributed Talk

Tensor-Product Discretization for the Spatially Inhomogeneous and Transient Boltzmann Equation, Simon Pintarelli (SAM ETHZ, Switzerland)

Co-Authors: Philipp Grohs (ETH Zurich, Switzerland); Ralf Hiptmair (ETH Zurich, Switzerland)

The Boltzmann equation provides a fundamental mesoscopic model for the dynamics of rarefied gases. The computational challenge arising from it's discretization is twofold: we face a moderately high-dimensional problem and the collision operator is non-linear and non-local in the velocity variable. We aim for a deterministic and asymptotically exact Galerkin discretization. This sets our approach apart from stochastic Monte-Carlo-type and Fourier based methods. We consider a tensor product discretization of the distribution function combining Laguerre polynomials times a Maxwellian in velocity with continuous first-order finite elements in space. Unlike the Fourier spectral methods, our approach does not require truncation of the velocity domain and it does not suffer from aliasing errors. The advection problem is discretized through a Galerkin least-squares technique and yields an implicit formulation in time. Numerical results of benchmark simulations in 2+2 dimensions will be presented.
Pizzi Giovanni MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:00-15:20

MS Presentation

High-Throughput Prediction of Novel Two-Dimensional Materials, Giovanni Pizzi (EPFL, Switzerland)

Co-Authors: Philippe Schwaller (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Ivano E. Castelli (EPFL, Switzerland); Marco Gibertini (EPFL, Switzerland); Giovanni Pizzi (EPFL, Switzerland); Nicola Marzari (EPFL, Switzerland)

As a crucial step towards the identification of novel and promising 2D materials, we provide here a large scale first-principles exploration and characterization of such compounds. From a combination of 480,000 structures harvested from the ICSD and COD databases, three-dimensional crystals are screened systematically by checking the absence of chemical bonds between adjacent layers, identifying more than 6,000 layered systems. Then DFT calculations of the van der Waals interlayer bonding are performed with automatic workflows, while systematically assessing the metallic, insulating or magnetic character of the materials obtained. Following full atomic and cell relaxations, phonon dispersions are computed as a first step towards the assessment of thermodynamic properties. Thanks to the AiiDA materials' informatics platform [1], and in particular its automatic workflow engine, database structure, sharing capabilities, and pipelines to/from crystallographic repositories, the systematic and reproducible calculation of these properties becomes straightforward, together with seamless accessibility and sharing. [1] http://aiida.net
Pleiter Dirk MS Presentation
Thursday, June 9, 2016
Garden 3B, 11:00-11:30

MS Presentation

Big Data Challenges Arising from Future Experiments, Dirk Pleiter (Forschungszentrum Juelich, Germany)

Co-Authors:

Future physics experiments and observatories rely on the capabilities to process significantly larger data streams. Examples are future light-source experiments, elementary particle experiments and radio-astronomy observatories. All have in common that they plan to exploit high-performance compute capabilities instead of relying on hardware controlled data processing. This approach increases flexibility during the lifetime of such experiments and may increase the use of commodity hardware, which is typically cheaper compared to custom solutions. While these experiments can thus benefit from HPC architectures and technologies, both as being available today as well as planned on future roadmaps, the requirements and use models differ significantly from today's high-performance computing. In this talk we will analyse the requirements of a few examples and discuss how they will benefit from current as well as future HPC technologies.
Plumley Meredith MS Presentation
Thursday, June 9, 2016
Garden 1A, 10:30-11:00

MS Presentation

Towards a Better Understanding of Rapidly Rotating Convection by Combining Direct Numerical Simulations and Asymptotic Modeling, Meredith Plumley (University of Colorado at Boulder, United States of America)

Co-Authors: Meredith Plumley (University of Colorado at Boulder, United States of America); Keith Julien (University of Colorado at Boulder, United States of America)

Realistic simulations of planetary dynamos will remain impossible in the near future. Especially the enormous range of spatial and temporal scales induced in convective flows by rotation plagues direct numerical simulations (DNS). The same scale disparities that hamper DNS can however be used to derive reduced equations that are expected to govern convection in the limit of rapid rotation. Simulations based on such formulations represent an interesting alternative to DNS. In this talk, recent efforts to test asymptotic models against DNS are reviewed. Results in plane layer geometry reveal convergence of both approaches. Surprisingly, Ekman layers have a profound effect in the rapidly rotating regime and explicitly have to be accounted for in the asymptotic models. Upscale kinetic energy transport leads to the formation of large-scale structures, which may play a prominent role in dynamos. The asymptotic models allow an exploration of parameter regimes far beyond the capabilities of DNS.
Podladchikov Yury Poster

Poster

EAR-05 Optimal Utilisation of Piz Daint Memory Bandwidth for a Parallel GPU Two-Phase Solver, Yury Podladchikov (University of Lausanne, Switzerland)

Co-Authors: Samuel Omlin (University of Lausanne, Switzerland); Yury Podladchikov (University of Lausanne, Switzerland)

Massively parallel algorithms are commonly based on iterative methods. Such algorithms may run with optimal performance on very large systems, such the Cray XC30 Piz Daint at CSCS. The iterative methods require typically few calculations per transferred floating point number (low flop to byte ratio), and are therefore normally memory throughput bound. We developed a parallel GPU Two-Phase Hydro-Mechanical solver to resolve nonlinear fluid flow in transforming porous media. The key point for maximum performance is to optimally use the hardware's memory bandwidth when performing the minimum amount of memory accesses needed to solve the coupled equations. We show promising results of single GPU memory throughput close to memory copy values (i.e. when no computation is done), and optimal parallel efficiency and a linear scaling up to 5000 GPU nodes, i.e. the entire Piz Daint machine.
Pospisil Lukas Poster

Poster

CSM-13 Towards the HPC-Inference of Causality Networks from Multiscale Economical Data, Lukas Pospisil (Università della Svizzera italiana, Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); Patrick Gagliardini (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland)

The novel non-stationary approach to causality inference of multivariate time-series was proposed during the recent research of project participants. This methodology uses the clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. For analysis of realistic datasets we develop HPC library that is built on top of PETSc and that implements MPI, OpenMP, and CUDA parallelization strategies. We present the mathematical aspects of the methodology and preliminary results of solving the non-stationary problem of causality inference for multivariate economic data with our HPC approach. The results are computed on Piz Daint at CSCS.
MS Presentation
Thursday, June 9, 2016
Garden 2A, 14:30-15:00

MS Presentation

Causality Inference in a Nonstationary and Nonhomogenous Framework, Lukas Pospisil (Università della Svizzera italiana, Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland); Lukas Pospisil (Università della Svizzera italiana, Switzerland)

The project deploys statistical and computational techniques to develop a novel approach to causality inference in multivariate time-series of economical data on equity and credit risks. The methods build on recent research of project participants. They improve on classical approaches to causality analysis by accommodating general forms of non-stationarity and non-homogeneity resulting from unresolved and latent scale effects. Emerging causality framework results in and is implemented through a clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. We are using finite element framework to propose a numerical scheme. One of the most challenging components of the emerging HPC implementation is a Quadratic Programming problem with linear equality and bound inequality constraints. We compare different algorithms and demonstrate the efficiency solving practical benchmark problems.
Potse Mark MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:30-14:00

MS Presentation

Accurate Estimation of 3D Ventricular Activation in Heart Failure Patients from Electroanatomic Mapping, Mark Potse (Università della Svizzera italiana, Switzerland)

Co-Authors: Peter Kalavsky (Università della Svizzera italiana, Switzerland); Mark Potse (Università della Svizzera italiana, Switzerland); Angelo Auricchio (Fondazione Cardiocentro Ticino, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland)

Accurate characterization of the cardiac activation sequence can support diagnosis and personalized therapy in heart-failure patients. Current invasive mapping techniques provide limited coverage. Our aim is to estimate a complete volumetric activation map from a limited number of measurements. This is achieved by optimising the local conductivity and early activation sites to minimise the mismatch between simulated and measured activation times. We modeled the activation times using an eikonal equation, reducing computational cost by 3 orders of magnitude compared to more physiologically detailed methods. The model provided a sufficiently accurate approximation of the activation time and the ECG in our patients. The solver was implemented on GPUs. Since the fast-marching method is not suitable for this architecture, we used a simple Jacobi iteration of a local variational principle. On a single GPU, each forward simulation took less than 2 seconds, and the inverse problem was solved in a few minutes.
Potter Toby MS Presentation
Thursday, June 9, 2016
Garden 1A, 15:00-15:15

MS Presentation

Leveraging the Madagascar Framework for Reproducible Large-scale Cluster and Cloud Computing, Toby Potter (University of Western Australia, Australia)

Co-Authors: Jeffrey Shragge (University of Western Australia, Australia)

Over the past decade the open-source Madagascar framework has been used for reproducible computational seismology research by a growing community of researchers. While Madagascar is commonly used for single-node applications, the increasing number and the computational complexity of user-submitted software tools (e.g., 3D seismic modelling, imaging and inversion codes) are pushing the limits of computational tractability at the workstation level. There is a growing interest and community experience in using Madagascar for cluster-scale public HPC facilities and cloud-based computing environments. In this presentation we highlight our procedure for interfacing Madagascar with publicly accessible HPC clusters, and provide case studies of using Madagascar for large-scale 3D seismic modelling and imaging activities. We present our recent efforts of moving toward a reproducible Madagascar framework within a cloud-computing environment, and provide an example of running 3D acoustic modelling on Australia's NECTAR cloud computing grid using a combination of Python, ZeroMQ, and Cython.
Pranger Casper MS Presentation
Friday, June 10, 2016
Garden 1A, 10:00-10:15

MS Presentation

From Tectonic to Seismic Timescales in 3D Continuum Models, Casper Pranger (Institute of Geophysics, ETH Zürich, Switzerland)

Co-Authors: Ylona van Dinther (ETH Zurich, Switzerland); Laetitia Le Pourhiet (Pierre and Marie Curie University, France); Dave A. May (ETH Zurich, Switzerland); Taras Gerya (ETH Zurich, Switzerland)

Lateral rupture limits substantially regulate the magnitude of great subduction megathrust earthquakes, but in turn, factors controlling it remain largely unknown due to the limited spatio-temporal range of observations. It however involves the long-term, regional tectonic history, including structural, stress and strength heterogeneities. This problem requires a powerful 3D-continuum numerical modelling approach that bridges tectonic and seismic timescales, but a suitable code is lacking. We demonstrate the development of a scalable PETSc-based staggered-grid finite difference code, in which self-consistent long-term deformation and spontaneous rupture are ensured through a solid-mechanics based visco-elasto-plastic rheology with a slip rate-dependent friction formulation, an energy-conservative inertial implementation, artificial damping of seismic waves at the domain boundaries, and an adaptive, implicit-explicit time-stepping scheme. Automated discretization and manufactured solution benchmarks ensure stability, flexibility and accuracy of the code at every stage of development.
Poster

Poster

EAR-02 Flexible Automatic Discretization for Finite Differences: Eliminating the Human Factor, Casper Pranger (Institute of Geophysics, ETH Zürich, Switzerland)

Co-Authors:

In the geophysical numerical modelling community, finite differences are (in part due to their small footprint) a popular spatial discretization method for PDEs in the regular-shaped continuum that is the earth. However, they rapidly become prone to programming mistakes when physics increase in complexity. To eliminate opportunities for human error, we have designed an automatic discretization algorithm using Wolfram Mathematica, in which the user supplies symbolic PDEs, the number of spatial dimensions, and a choice of symbolic boundary conditions, and the script transforms this information into matrix- and right-hand-side rules ready for use in a C++ code that will accept them. The symbolic PDEs are further used to automatically develop and perform manufactured solution benchmarks, ensuring at all stages physical fidelity while providing pragmatic targets for numerical accuracy. We find that this procedure greatly accelerates code development and provides a great deal of flexibility in ones choice of physics.
Puzyrev Vladimir MS Presentation
Friday, June 10, 2016
Garden 1A, 10:45-11:00

MS Presentation

Computational Challenges of Electromagnetic Modeling at Multiple Scales, Vladimir Puzyrev (Barcelona Supercomputing Center, Spain)

Co-Authors:

Traditional three-dimensional modelling approaches to electromagnetic problems in geophysics work well in the cases when the resulting fields exhibit smooth variations on large spatial scales. However, the presence of highly conductive anomalies (e.g., metallic casing of the wells and other steel infrastructure) can affect the electromagnetic fields significantly. Extreme material contrasts and multiple spatial scales create serious computational challenges. Realistic interpretation of steel objects requires an extremely fine, millimeter-scale spatial discretization in a large computational domain whose size is of the order of tens of kilometers. Conductivity contrasts between steel and surrounding media can exceed 8-9 orders of magnitude. Even locally refined unstructured meshes lead to ill-conditioned problems with many millions of unknowns and hence require the use of high performance computing. In this presentation, I will give several examples of frequency-domain modelling scenarios when the presence of small-scale objects has a tremendous effect on electromagnetic measurements.

Q

Quarteroni Alfio MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:45-17:00

MS Presentation

Computational Study of the Risk of Restenosis in Coronary Bypasses, Alfio Quarteroni (SB MATHICSE CMCS, EPFL, Switzerland)

Co-Authors: Christian Vergara (Politecnico di Milano, Italy); Sonia Ippolito (Ospedale Luigi Sacco Milano, Italy); Roberto Scrofani (Ospedale Luigi Sacco Milano, Italy); Alfio Quarteroni (EPFL, Switzerland)

Coronary artery disease, caused by the build-up of atherosclerotic plaques in coronary vessel walls, is one of the leading causes of death in the world. For high-risk patients, coronary artery bypass graft is the preferred treatment. Despite overall excellent patency rates, bypasses may fail due to restenosis. In this context, we present a computational study of the fluid-dynamics in patient-specific geometries with the aim of investigating a possible relationship between coronary stenosis and graft failure. Firstly, we propose a strategy to prescribe realistic boundary conditions in absence of measured data, based on an extension of Murray's law to provide the flow division at bifurcations in case of stenotic vessels and non-Newtonian blood rheology. Then, we show some results regarding numerical simulations in patients treated with grafts, in which the degree of coronary stenosis is virtually varied to compare the fluid-dynamics in terms of hemodynamic indices potentially involved in restenosis development.
MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:30-16:45

MS Presentation

Coupled Mathematical and Numerical Models for Integrated Simulations of the Left Ventricle, Alfio Quarteroni (SB MATHICSE CMCS, EPFL, Switzerland)

Co-Authors: Luca Dede' (EPFL, Switzerland); Davide Forti (EPFL, Switzerland); Alfio Quarteroni (EPFL, Switzerland)

In this talk, we focus on the coupling of electrophysiology and mechanical models to realize an integrated model of the left ventricle by considering the active contraction of the muscle and the feedback on the electrophysiology. For the latter, we consider the mono-domain equations with the Bueno-Orovio ionic model. As for the mechanics, we consider the Holzapfl-Ogden model together with an active strain approach with a transmurally variable activation parameter. We spatially approximate the model by means of the Finite Element method and discuss the properties of different coupling strategies and time discretization schemes. Among these, we consider a fully coupled strategy with a semi-implicit scheme for the time discretization. In order to solve the linear system arising from such discretization, we use a preconditioner based on the FaCSI (Factorized Condensed SIMPLE) concept. We present and discuss numerical results obtained in the HPC framework, including patient-specific left ventricle geometries.
Quinn Thomas MS Summary

MS Summary

MS02 Advanced Computing in Plasma, Particle and Astrophysics on Emerging HPC Architectures, Thomas Quinn (University of Washington, United States of America)

Co-Authors: Thomas Quinn (University of Washington, United States of America)

Non-traditional computing architectures such as general purpose graphics processing units (GPGPUs) and many-integrated core accelerators are providing leading edge performance for advanced scientific computing. Their advantages are particularly evident when the power cost is considered: The top 10 "green500", where flops/watt is the ranking criteria, are all heterogeneous machines. Given that power costs are becoming more critical, future capability machines are likely to be dominated by these architectures. Non-traditional architectures may require non-traditional programming models, and the scientific community is still learning how to take full advantage of heterogeneous machines with reasonable programming effort. This difficulty is compounded by the need for sophisticated algorithms to handle the large dynamic ranges encountered in state-of-the-art physics and astrophysics simulations. This minisymposium provides a forum for researchers in the computational plasma, particle physics and astrophysics communities to share their techniques and findings. The presentation and discussion of findings and lessons learned will foment more effective use of these new resources for the advancement of physics and astrophysics.
Quintino Tiago MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:30-15:45

MS Presentation

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System, Tiago Quintino (ECMWF, United Kingdom)

Co-Authors: Tiago Quintino (ECMWF, United Kingdom); Baudouin Raoult (ECMWF, United Kingdom); Simon Smart (ECMWF, United Kingdom); Stephan Siemen (ECMWF, United Kingdom); Peter Bauer (ECMWF, United Kingdom)

As the resolution of the forecasts produced by ECMWF's Integrated Forecast System (IFS) is refined, the amount of data involved continues its geometric growth. Current peak loads already require an otherwise oversized parallel storage filesystem (Lustre). The data volume is expected to grow 6-fold by 2020, to reach 120TB/day, concentrated in short 1 hour bursts. Moreover, this data requires post-processing to create the final forecast products sent to end-users, introducing a further I/O bottleneck. Realizing these challenges, ECMWF's Scalability Programme aims to redesign the data workflow to minimize I/O in the time-critical path, whilst retaining resilience to failures. The authors are investigating multiple solutions to tackle issues of data locality, data volume and overall resilience. Solutions range from a novel NVRAM hardware co-design effort inside the EU funded NEXTGenIO project, to the use of distributed object storage technologies and a new dynamic worker-broker solution for managing the post-processing workload.

R

Raess Ludovic Poster

Poster

EAR-05 Optimal Utilisation of Piz Daint Memory Bandwidth for a Parallel GPU Two-Phase Solver, Ludovic Raess (University of Lausanne, Switzerland)

Co-Authors: Samuel Omlin (University of Lausanne, Switzerland); Yury Podladchikov (University of Lausanne, Switzerland)

Massively parallel algorithms are commonly based on iterative methods. Such algorithms may run with optimal performance on very large systems, such the Cray XC30 Piz Daint at CSCS. The iterative methods require typically few calculations per transferred floating point number (low flop to byte ratio), and are therefore normally memory throughput bound. We developed a parallel GPU Two-Phase Hydro-Mechanical solver to resolve nonlinear fluid flow in transforming porous media. The key point for maximum performance is to optimally use the hardware's memory bandwidth when performing the minimum amount of memory accesses needed to solve the coupled equations. We show promising results of single GPU memory throughput close to memory copy values (i.e. when no computation is done), and optimal parallel efficiency and a linear scaling up to 5000 GPU nodes, i.e. the entire Piz Daint machine.
Raoult Baudouin MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:30-15:45

MS Presentation

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System, Baudouin Raoult (ECMWF, United Kingdom)

Co-Authors: Tiago Quintino (ECMWF, United Kingdom); Baudouin Raoult (ECMWF, United Kingdom); Simon Smart (ECMWF, United Kingdom); Stephan Siemen (ECMWF, United Kingdom); Peter Bauer (ECMWF, United Kingdom)

As the resolution of the forecasts produced by ECMWF's Integrated Forecast System (IFS) is refined, the amount of data involved continues its geometric growth. Current peak loads already require an otherwise oversized parallel storage filesystem (Lustre). The data volume is expected to grow 6-fold by 2020, to reach 120TB/day, concentrated in short 1 hour bursts. Moreover, this data requires post-processing to create the final forecast products sent to end-users, introducing a further I/O bottleneck. Realizing these challenges, ECMWF's Scalability Programme aims to redesign the data workflow to minimize I/O in the time-critical path, whilst retaining resilience to failures. The authors are investigating multiple solutions to tackle issues of data locality, data volume and overall resilience. Solutions range from a novel NVRAM hardware co-design effort inside the EU funded NEXTGenIO project, to the use of distributed object storage technologies and a new dynamic worker-broker solution for managing the post-processing workload.
Ratcliff Laura MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 14:20-14:40

MS Presentation

BigDFT: Flexible DFT Approach to Large Systems Using Adaptive and Localized Basis Functions, Laura Ratcliff (Argonne National Laboratory, United States of America)

Co-Authors: Luigi Genovese (CEA/INAC, France); Stefan Mohr (BSC, Spain); Laura Ratcliff (Argonne National Laboratory, United States of America); Stefan Goedecker (University of Basel, Switzerland)

Since 2008, the BigDFT project consortium has developed an ab initio DFT code based on Daubechies wavelets. In recent articles, we presented the linear scaling version of BigDFT code[1], where a minimal set of localized support functions is optimised in situ for systems in various boundary conditions. We will present how the flexibility of this approach is helpful in providing a basis set that is optimally tuned to the chemical environment surrounding each atom. In addition than providing a basis useful to project Kohn-Sham orbitals informations like atomic charges and partial density of states, it can also be reused as-is, without re-optimisation, for charge-constrained DFT calculations within a fragment approach[2]. We will demonstrate the interest of this approach to express highly precise and efficient calculations of systems in complex environments[3]. [1] JCP 140, 204110 (2014), PCCP 17, 31360 (2015) [2] JCP 142, 23, 234105 (2015) [3] JCTC 11, 2077 (2015)
Rathgeber Florian MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:30-15:45

MS Presentation

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System, Florian Rathgeber (ECMWF, United Kingdom)

Co-Authors: Tiago Quintino (ECMWF, United Kingdom); Baudouin Raoult (ECMWF, United Kingdom); Simon Smart (ECMWF, United Kingdom); Stephan Siemen (ECMWF, United Kingdom); Peter Bauer (ECMWF, United Kingdom)

As the resolution of the forecasts produced by ECMWF's Integrated Forecast System (IFS) is refined, the amount of data involved continues its geometric growth. Current peak loads already require an otherwise oversized parallel storage filesystem (Lustre). The data volume is expected to grow 6-fold by 2020, to reach 120TB/day, concentrated in short 1 hour bursts. Moreover, this data requires post-processing to create the final forecast products sent to end-users, introducing a further I/O bottleneck. Realizing these challenges, ECMWF's Scalability Programme aims to redesign the data workflow to minimize I/O in the time-critical path, whilst retaining resilience to failures. The authors are investigating multiple solutions to tackle issues of data locality, data volume and overall resilience. Solutions range from a novel NVRAM hardware co-design effort inside the EU funded NEXTGenIO project, to the use of distributed object storage technologies and a new dynamic worker-broker solution for managing the post-processing workload.
MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:00-13:30

MS Presentation

Firedrake: Automating the Finite Element Method by Composing Abstractions, Florian Rathgeber (ECMWF, United Kingdom)

Co-Authors: David A. Ham (Imperial College London, United Kingdom); Andrew T. T. McRae (University of Bath, United Kingdom); Florian Rathgeber (ECMWF, United Kingdom); Gheorghe-Teodor Bercea (Imperial College London, United Kingdom); Miklós Homolya (Imperial College London, United Kingdom); Fabio Luporini (Imperial College London, United Kingdom); Paul H. J. Kelly (Imperial College London, United Kingdom)

The complexity inherent in the application of advanced numerics on modern hardware to coupled physical systems presents a critical barrier to simulation development. To overcome this, we must create simulation software which embodies the abstraction and composability of the underlying mathematics. In this way, a system is created in which mathematicians, computer scientists, and application specialists can each deploy their own expertise, benefiting from the expertise of the others. Critically, this approach minimises the extent to which individuals must become polymaths to share in these advances. In this talk I will present Firedrake and PyOP2, a composition of new and existing abstractions which creates a particularly complete separation of concerns. This enables the creation of high performance, sophisticated finite element models from a very high level mathematical specification and has enabled advances in computer science and numerics, while also facilitating the creation of simulation systems for a variety of applications.
Ratnaswamy Vishagan MS Presentation
Friday, June 10, 2016
Garden 1A, 09:00-09:30

MS Presentation

HPC Challenges Arising in Forward and Inverse Mantle Flow Simulation, Vishagan Ratnaswamy (California Institute of Technology, United States of America)

Co-Authors: Johann Rudi (University of Texas at Austin, United States of America); Vishagan Ratnaswamy (California Institute of Technology, United States of America); Dunzhu Li (California Institute of Technology, United States of America); Tobin Isaac (University of Chicago, United States of America); Michael Gurnis (California Institute of Technology, United States of America); Omar Ghattas (University of Texas at Austin, United States of America)

We discuss scalable solvers for the forward and inverse simulation of mantle flow problems. Crucial solver components for the arising nonlinear Stokes problems are parallel multigrid methods for preconditioning the linearized Stokes system, and a Schur complement approximation that is able to cope with extreme viscosity variations. To achieve good parallel scalability, we use, among others, matrix-free operations and we redistribute coarse multigrid levels to a subsets of all available processors. We will discuss the inversion of global rheology parameters and distributed fields from surface data and the present-day temperature distribution in instantaneous and time-dependent problems.
Redaschi Nicole Poster

Poster

LS-08 The UniProt SPARQL Endpoint: 21 Billion Triples in Production, Nicole Redaschi (Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Sebastien Gehant (Swiss Institute of Bioinformatics, Switzerland); Thierry Lombardot (Swiss Institute of Bioinformatics, Switzerland); Lydie Bougueleret (Swiss Institute of Bioinformatics, Switzerland); Ioannis Xenarios (Swiss Institute of Bioinformatics, Switzerland); Nicole Redaschi (Swiss Institute of Bioinformatics, Switzerland)

The UniProt knowledgebase is a leading resource of protein sequences and functional information whose centerpiece is the expert-curated Swiss-Prot section. UniProt data is accessible at www.uniprot.org (via a user-friendly interface and a REST API) and at sparql.uniprot.org, a public SPARQL endpoint hosted and maintained by the Vital-IT and Swiss-Prot groups of SIB. With 21 billion RDF triples it is the largest free to use graph database in the sciences. SPARQL allows scientists to perform complex queries within UniProt and across datasets located on remote SPARQL endpoints. It provides a free data integration solution for users who cannot afford to create custom data warehouses, at a cost for the service providers. Here we discuss the challenges in maintaining the UniProt SPARQL endpoint, which is updated monthly in sync with the UniProt data releases.
Reed Darren S. Poster

Poster

PHY-01 DIAPHANE: A Library for Radiation and Neutrino Transport in Hydrodynamic Simulations, Darren S. Reed (University of Zurich, Switzerland)

Co-Authors:

We report on the status of the "DIAPHANE'" PASC project. The library contains modules to model the physical processes for energy transport by radiation and neutrinos that are most important for astrophysics. A common API is used to access each algorithm. The capability to model simultaneously multiple algorithms enables a wide range of new simulations such as, for example, the heating of gas by a stellar source and diffusion of that energy into a surrounding gas disk or nebula. Astrophysics applications include: supernovae; the formation of planets, black holes, stars, and galaxies. We demonstrate the current functionality and discuss some of the challenges and strategies in making the library modular, portable, and efficient for current and future HPC architectures.
Reiter Lukas MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 17:00-17:15

MS Presentation

Working with Limited Resources: Large-Scale Proteomic Data-Analysis on Cheap Gaming Desktop PCs, Lukas Reiter (Biognosys AG, Switzerland)

Co-Authors: Lukas Reiter (Biognosys AG, Switzerland); Tejas Gandhi (Biognosys AG, Switzerland); Roland Bruderer (Biognosys AG, Switzerland)

One of the major challenges in mass-spec driven proteomics research is data-analysis. Many research facilities have the capacity to generate several gigabytes of data per hour. To process such data though, software solutions for high-throughput data analysis often require a cluster computing infrastructure. Since many research facilities do not have the required IT infrastructure for large-scale data-processing, this kind of proteomics research was restricted to only a few proteomics groups. Here we present a software solution that is capable of processing terabytes of large proteomics experiments on a cheap desktop gaming PC setup. We will focus on how to overcome the issue of limited resources while still maintaining a high-throughput data-analysis and reasonable scalability.
Poster

Poster

LS-02 Generating Very Large Spectral Libraries for Targeted Proteomics Analysis Using Spectronaut, Lukas Reiter (Biognosys AG, Switzerland)

Co-Authors: Roland Bruderer (Biognosys AG, Switzerland); Oliver Martin Bernhardt (Biognosys AG, Switzerland); Lukas Reiter (Biognosys AG, Switzerland)

Mass spectrometer (MS) based data-independent acquisition with targeted analysis offers new possibilities for highly multiplexed peptide and protein quantification experiments. This type of analysis often includes a spectral library as a prerequisite. In layman's terms, a spectral library is a collection of fingerprints that facilitates the identification of signals measured by the MS. Both the size and the quality of a spectral library, acting as a template for these target signals, can make a significant difference in the quality of the data analysis. Recently, the trend has been moving towards generating very large spectral libraries consisting of hundreds of thousands of peptides stemming from tens of thousands of proteins. From a software engineering perspective, the challenge then is to process and manage such large libraries in an efficient manner. Here we present our solution towards generating very large spectral libraries while using a standard gaming workstation.
Rendahl Pontus MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:45-16:00

MS Presentation

Exact Present Solution with Consistent Future Approximation: A Gridless Algorithm to Solve Stochastic Dynamic Models, Pontus Rendahl (University of Cambridge, United Kingdom)

Co-Authors:

This paper proposes an algorithm that finds model solutions at a particular point in the state space by solving a simple system of equations. The key step is to characterize future behaviour with a Taylor series expansion of the current period's behaviour around the contemporaneous values for the state variables. Since current decisions are solved from the original model equations, the solution incorporates nonlinearities and uncertainty. The algorithm is used to solve the model considered in Coeurdacier, Rey, and Winant (2011), which is a challenging model because it has no steady state and uncertainty is necessary to keep the model well behaved. We show that our algorithm can generate accurate solutions even when the model series are quite volatile. The solutions generated by the risky-steady-state algorithm proposed in Coeurdacier, Rey, and Winant (2011), in contrast, is shown to be not accurate.
Renevey Annick V. Poster

Poster

LS-07 The Importance of N-Methylations for the Stability of the β6.3-Helical Conformation of Polytheonamide B, Annick V. Renevey (ETH Zurich, Switzerland)

Co-Authors: Sereina Z. Riniker (ETH Zurich, Switzerland)

Polytheonamide B (PTB) is a highly cytotoxic transmembrane cation channel consisting of 49 residues, of which more than half are posttranslationally modified. Epimerizations result in alternating L- and D- amino acids allowing the peptide to adopt a β-helical structure stable in solution. The role of the other posttranslational modifications (PTMs): hydroxylations, side chain C-methylations and side chain N-methylations, is less understood. The importance of these PTMs for the β6.3-helical structure is investigated using molecular dynamics simulations. Groups or individual modified residues are reverted to their precursor amino acids and the conformational effect on PTB monitored. The simulation results indicate that the N-methylations are crucial for the stability of the β6.3-helix due to the formation of the side chain?side chain hydrogen bond chains that act like an "exoskeleton" for the helix. With unmethylated asparagine residues, the H-bond chains are unstable in polar solvents, resulting in the loss of the helical structure.
Reuter Klaus MS Presentation
Wednesday, June 8, 2016
Garden 3C, 17:00-17:15

MS Presentation

Parallelization Strategies for a Semi-Lagrangian Vlasov Code, Klaus Reuter (Max Planck Computing and Data Facility, Germany)

Co-Authors: Klaus Reuter (Max Planck Computing and Data Facility, Germany); Eric Sonnendrücker (Max Planck Society, Germany)

Grid-based solvers for the Vlasov equation give accurate results but suffer from the curse of dimensionality. To enable the grid-based solution of the Vlasov equation in 6d phase-space, we need efficient parallelization schemes. In this talk, we consider the 6d Vlasov-Poisson problem discretized by a split-step semi-Lagrangian scheme. This method works with successive 1d interpolations on 1d stripes of the 6d domain. We consider two parallelization strategies: A remapping strategy that works with two different layouts keeping parts of the dimensions sequential and a classical partitioning into hyper-rectangles. The 1d interpolations can be performed sequentially on each processor for the remapping scheme. On the other hand, the remapping consists in an all-to-all communication pattern. The partitioning only requires localized communication but each 1d interpolation needs to be performed on distributed data. We compare both parallelization schemes and discuss how to efficiently handle the domain boundaries in the interpolation for partitioning.
Rhee Moono MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:00-14:30

MS Presentation

The Development of ParaDiS for HCP Crystals, Moono Rhee (Lawrence Livermore National Laboratory, United States of America)

Co-Authors: Sylvie Aubry (Lawrence Livermore National Laboratory, United States of America); Moono Rhee (Lawrence Livermore National Laboratory, United States of America); Brett Wayne (Lawrence Livermore National Laboratory, United States of America); Gregg Hommes (Lawrence Berkeley National Laboratory, United States of America)

The ParaDiS project at LLNL was created to build a scalable massively parallel code for the purpose of predicting evolution of strength and strain hardening and crystalline materials under dynamic loading conditions by directly integrating the elements of dislocation physics. The code has been used by researchers at LLNL and around the world to simulate the behaviour of dislocation networks in a wide variety of applications, from high temperature structural materials, to nuclear materials, to armor materials, to photovoltaic systems. ParaDiS has recently been extended to include a fast analytical algorithm for the computation of forces in anisotropic elastic media, and an augmented set of topological operations to treat the complex core physics of dislocations and other dislocations that routinely appear in HCP metals. The importance and implications of these developments on the engineering properties of HCP metals will be demonstrated in large scale simulations of strain hardening.
Ricci Paolo Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Paolo Ricci (Ecole Polytechnique Fédérale de Lausanne (EPFL), Swiss Plasma Center (SPC), CH-1, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Rietmann Max Contributed Talk
Thursday, June 9, 2016
Garden 1A, 15:30-15:45

Contributed Talk

A Tetrahedral Spectral Element Method for the Seismic Wave Equation, Max Rietmann (ETH Zurich, Switzerland)

Co-Authors: Martin van Driel (ETH Zurich, Switzerland)

Although the hexahedral Spectral Element Method (SEM) has become a standard tool in computational seismology, the main difficulty in applications remains the meshing: until today, no robust algorithm exists that allows automatic meshing of complex geological geometries. Here we demonstrate how the concept of the SEM can be applied to tetrahedral elements, for which such automated meshing tool exist. The key idea has previously been applied to the acoustic wave equation (Zhebel et al 2014) and consists in obtaining a stable quadrature rule with strictly positive weights by adding extra quadrature points in the interior of the elements. In this way, a diagonal-mass matrix is achieved while maintaining the order of convergence. We present the basic rational and show convergence up to fourth order in space and time. Finally, we compare our method and its implementation against the performance profile of existing hexahedral SEM implementations.
MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:45-15:00

MS Presentation

Salvus: A Flexible Open-Source Package for Full-Waveform Modelling and Inversion, Max Rietmann (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Martin van Driel (ETH Zurich, Switzerland); Lion Krischer (Ludwig Maximilian University of Munich, Germany); Dave A. May (ETH Zurich, Switzerland); Max Rietmann (ETH Zurich, Switzerland); Korbinian Sager (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Within all domain-specific software projects, finding the correct balance between flexibility and performance is often difficult. In the seismic imaging community, the trend has been to move towards codes which are heavily optimised, but which often sacrifice usability and flexibility. Here we introduce Salvus: an open-source HPC high-order finite element (FE) package focused on full-waveform modelling and inversion, which is designed to be both flexible and performant. Salvus was constructed by following modern software design practices, testing protocols, and by establishing its foundations upon existing open-source high-level scientific libraries. The FE framework is generalized over spatial dimensions, time-integrators, polynomial order and wave-propagation physics, and provides support for both hexahedral and tetrahedral meshes. Additionally, support is included for various numerical optimisation methods. We discuss our usage of existing open-source scientific libraries, our choice level of abstraction, and quantitatively investigate the performance penalties associated with these abstractions.
Rignanese Gian-Marco MS Summary

MS Summary

MS21 Materials Design by High-Throughput Ab Initio Computing, Gian-Marco Rignanese (Université catholique de Louvain, Belgium)

Co-Authors: Gian-Marco Rignanese (Université catholique de Louvain, Belgium)

Materials advances often drive technological innovation (faster computers, more efficient solar cells, more compact energy storage). Experimental discovery of new materials suitable for specific applications is, however, a complex task, relying on high costs and time-consuming procedures of synthesis. Computational materials science is now powerful enough for predicting many materials properties even before synthesizing those materials in the lab and appears as a cheap basis to orient experimental searches efficiently. Recent advances in computer speed and first-principles algorithms have led to the development of fast and robust codes, making it possible to do large numbers of calculations automatically. This is the burgeoning area of high-throughput first-principles computation. The concept though simple is very powerful. High-throughput calculations are used to create large databases containing the calculated properties of existing and hypothetical materials. These databases can then be intelligently interrogated, searching for materials with desired properties and so removing the guesswork from materials design. Various open-domain on-line repositories have appeared to make these databases available to everyone. Areas of applications include solar materials, topological insulators, thermoelectrics, piezoelectrics, materials for catalysis, battery materials, etc. While it has reached a good level of maturity, the high-throughput first-principles approach still requires many improvements. Several important properties and classes of materials have not been dealt with yet, and further algorithm implementations, repositories and data-mining interfaces are necessary. This minisymposium will be devoted to the presentation of the most recent developments in this field. It will also provide an agora for some of the leading researchers to put forward the most recent achievements, to address the current challenges, and to discuss the most promising directions. The speakers will be selected to illustrate the many disciplines that are contributing to this effort, covering practical applications (e.g., magnetic materials, thermoelectrics, transparent conductors, 2D materials), theoretical developments (e.g., novel functionals within Density Functional Theory, local basis representations for effective ab-initio tight-binding schemes), and technical aspects (e.g., high-throughput frameworks).
Riha Lubomir Paper
Wednesday, June 8, 2016
Auditorium C, 16:30-17:00

Paper

Massively Parallel Hybrid Total FETI (HTFETI) Solver, Lubomir Riha (IT4Innovations National Supercomputing Center/ Ostrava, Czech Republic, Czech Republic)

Co-Authors: Tomáš Brzobohatý (IT4Innovations National Supercomputing Center, Czech Republic); Alexandros Markopoulos (IT4Innovations National Supercomputing Center, Czech Republic); Ondřej Meca (IT4Innovations National Supercomputing Center, Czech Republic); Tomáš Kozubek (IT4Innovations National Supercomputing Center, Czech Republic)

This paper describes the Hybrid Total FETI (HTFETI) method and its parallel implementation in the ESPRESO library. HTFETI is a variant of the FETI type domain decomposition method in which a small number of neighboring subdomains is aggregated into clusters. This can be also viewed as a multilevel decomposition approach which results into a smaller coarse problem - the main scalability bottleneck of the FETI and FETI-DP methods.

The efficiency of our implementation which employs hybrid parallelization in the form of MPI and Cilk++ is evaluated using both weak and strong scalability tests. The weak scalability of the solver is shown on the 3 dimensional linear elasticity problem of a size up to 30 billion of Degrees Of Freedom (DOF) executed on 4096 compute nodes. The strong scalability is evaluated on the problem of size 2.6 billion DOF scaled from 1000 to 4913 compute nodes. The results show the super-linear scaling of the single iteration time and linear scalability of the solver runtime. The latter combines both numerical and parallel scalability and shows overall HTFETI solver performance. The large scale tests use our own parallel synthetics benchmark generator that is also described in the paper.

The last set of results shows that HTFETI is very efficient for problems of size up 1.7 billion DOF and provide better time to solution when compared to TFETI method.
Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:30-11:50

Contributed Talk

The Energy Consumption Optimization of the FETI Solver, Lubomir Riha (IT4Innovations National Supercomputing Center/ Ostrava, Czech Republic, Czech Republic)

Co-Authors: Lubomir Riha (IT4Innovations National Supercomputing Center, Czech Republic); Radim Sojka (IT4Innovations National Supercomputing Center, Czech Republic); Jakub Kruzik (IT4Innovations National Supercomputing Center, Czech Republic); Martin Beseda (IT4Innovations National Supercomputing Center, Czech Republic)

The presentation deals with the energy consumption evaluation of the FETI method blending iterative and direct solvers in the scope of READEX project. The measured characteristics on model cube benchmark illustrate the behaviour of preprocessing and solve phases related mainly to the CPU frequency, different problem decompositions, compiler's type and compiler's parameters. In preprocessing it is necessary to factorize the stiffness and coarse problem matrices, which belongs to the most time and also energy consuming operations. The solve employs the conjugate gradient algorithm and consists of sparse matrix-vector multiplications and vector dot products or AXPY functions. In each iteration we need to apply direct solver twice for pseudo-inverse action and coarse problem solution. All these operations cover together the basic Sparse and Dense BLAS Level 1, 2 and 3 routines, so that we can explore their different dynamism and dynamic switching between various configurations can then provide significant energy savings.
Riniker Sereina Z. Poster

Poster

LS-06 Structural and Dynamic Properties of Cyclosporin A: Molecular Dynamics and Markov State Modelling, Sereina Z. Riniker (ETH Zurich, Switzerland)

Co-Authors: Bettina Keller (Free University of Berlin, Germany); Sereina Z. Riniker (ETH Zurich, Switzerland)

The membrane permeability of cyclic peptides is likely influenced by the conformational behavior of these compounds in polar and apolar environments. The size and complexity of peptides often limits their bioavailability, but there are known examples of peptide natural products such as cyclosporin A (CsA) that can cross cell membranes by passive diffusion. The crystal of CsA structure shows a "closed" conformation with four intramolecular hydrogen bonds. When binding to its target cyclophilin, CsA adopts an "open" conformation without intramolecular hydrogen bonds. In this study, we attempted to sample exhaustively the conformational space of CsA in chloroform and in water by molecular dynamics simulations in order to rationalize the good membrane permeability of CsA observed experimentally. From 10 μs molecular dynamics simulations in each solvent, Markov state models were constructed to characterize the metastable conformational states. The conformational landscapes in both solvents show significant overlap, but also clearly distinct features.
Poster

Poster

LS-07 The Importance of N-Methylations for the Stability of the β6.3-Helical Conformation of Polytheonamide B, Sereina Z. Riniker (ETH Zurich, Switzerland)

Co-Authors: Sereina Z. Riniker (ETH Zurich, Switzerland)

Polytheonamide B (PTB) is a highly cytotoxic transmembrane cation channel consisting of 49 residues, of which more than half are posttranslationally modified. Epimerizations result in alternating L- and D- amino acids allowing the peptide to adopt a β-helical structure stable in solution. The role of the other posttranslational modifications (PTMs): hydroxylations, side chain C-methylations and side chain N-methylations, is less understood. The importance of these PTMs for the β6.3-helical structure is investigated using molecular dynamics simulations. Groups or individual modified residues are reverted to their precursor amino acids and the conformational effect on PTB monitored. The simulation results indicate that the N-methylations are crucial for the stability of the β6.3-helix due to the formation of the side chain?side chain hydrogen bond chains that act like an "exoskeleton" for the helix. With unmethylated asparagine residues, the H-bond chains are unstable in polar solvents, resulting in the loss of the helical structure.
Poster

Poster

LS-05 Replica-Exchange Enveloping Distribution Sampling: A Robust and Accurate Method to Calculate Multiple Free Energy Differences from a Single Simulation, Sereina Z. Riniker (ETH Zurich, Switzerland)

Co-Authors: Sereina Z. Riniker (ETH Zurich, Switzerland)

Enveloping distribution sampling (EDS) presents an attractive alternative to standard methods for the calculation of free-energy differences ∆G from molecular dynamics (MD) simulations, as it allows the estimation of ∆G between multiple states from a single simulation of a reference state R. The challenge of the approach is the determination of optimal parameters for R to ensure equal sampling of all end states. While an automatic selection procedure is available for two end states, the determination of optimal R-parameters for multiple end states is currently an unsolved issue. To address this, we have generalized the replica-exchange EDS (RE-EDS) methodology, previously developed for constant-pH MD. By exchanging configurations between replicas with different R-parameters, major parts of the parameter-choice problem can be circumvented, resulting in a reliable, robust and accurate method. We have evaluated RE-EDS using a test system with five inhibitors of phenylethanolamine N-methyltransferase (PNMT) studied previously.
Rinn Bernd MS Presentation
Thursday, June 9, 2016
Garden 3A, 15:00-15:15

MS Presentation

Enhancing the Computational Capabilities for Biologists: Genomic Data Analysis Services at ETH Zurich, Bernd Rinn (ID Scientific IT Services, ETH Zurich, Switzerland)

Co-Authors: Thomas Wüst (ETH Zurich, Switzerland); Bernd Rinn (ETH Zurich, Switzerland)

Genomics-based biological research (e.g., next-generation sequencing) generates increasingly amounts of data, which need dedicated high-performance computing (HPC) resources to be analysed efficiently. However, the specialization in both areas (namely, genomics and HPC) makes it increasingly challenging to bring the two fields together and to leverage the usage of available computational resources by biologists. The mission of Scientific IT Services (SIS) of ETH Zurich is to bridge this gap and to provide client-tailored solutions for big data genomics. In this presentation, we illustrate this need and our approach by selected examples ranging from the design of automated, high-throughput NGS analysis workflows through addressing the biology "software stack jungle" to scientific IT education for biologists. Throughout the talk, we emphasize the importance of scientific data management, consulting needs and community building for using HPC in biological research.
Riva Fabio Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Fabio Riva (Ecole Polytechnique Fédérale de Lausanne (EPFL), Swiss Plasma Center (SPC), CH-1, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Robertsson Johan MS Presentation
Friday, June 10, 2016
Garden 1A, 10:30-10:45

MS Presentation

Dynamically Linking Seismic Wave Propagation at Different Scales, Johan Robertsson (ETH Zurich, Switzerland)

Co-Authors: Marlies Vasmel (ETH Zurich, Switzerland); Dirk-Jan van Manen (ETH Zurich, Switzerland); Johan Robertsson (ETH Zurich, Switzerland)

Numerical modelling of seismic wave propagation can be of great value at many scales, ranging from shallow applications in engineering geophysics to global scale seismology. Accurate modelling of the physics of wave propagation at different scales requires different spatial and temporal discretization and potentially also different numerical methods. We present a new method to dynamically link the waves propagating at these different scales. A finite-difference solver is used on a local grid, whereas the (much) larger background domain is represented by its (precomputed) Green's functions. At each time step of the simulation, the interaction between the events leaving the local domain and the medium outside is calculated using a Kirchhoff-type integral extrapolation and the extrapolated wavefield is applied as a boundary condition to the local domain. This results in a numerically exact hybrid modelling scheme, also after local updates of the model parameters.
MS Summary

MS Summary

MS23 Open Source Software (OSS) and High Performance Computing (HPC), Johan Robertsson (ETH Zurich, Switzerland)

Co-Authors: Johan Robertsson (ETH Zurich, Switzerland)

Open Source Software (OSS) plays a fundamental role in research-driven projects and, for this reason, it cannot be neglected by academia and industry. OSS is radically transforming how software is being developed by various scientific communities and it is likely to be central to future research activities in many more fields. The process of development has to reach beyond organizational boundaries to unleash new potentials and open paths to new collaborations. OSS scientific applications are required to solve complex and data-intensive research problems. These applications range from smaller scale simulations developed on a desktop machine to large, parallel simulations of the physical world using High Performance Computing (HPC) systems. The minisymposium is focused on identifying specific aspects of Open Source Software for the development of scientific software that exploits High Performance Computing (HPC) architectures. This class of OSS applications includes software developed to perform, for example, modelling of wave propagation in the Earth and real-time visualization of great volumes of data. This minisymposium will bring researchers from various environments together to exchange experience, findings, and ideas in the realm of Open Source Software. The speakers will demonstrate a practical working success related to OSS and HPC and present future directions for where we need to go.
Robins Garry Poster

Poster

EMD-03 Parallel MCMC for Estimating Exponential Random Graph Models, Garry Robins (Melbourne School of Psychological Sciences, The University of Melbourne, Australia)

Co-Authors: Alex Stivala (University of Melbourne, Australia); Antonietta Mira (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Garry Robins (University of Melbourne, Australia); Alessandro Lomi (Università della Svizzera italiana, Switzerland)

As information and communication technologies continue to expand, the need arises to develop analytical strategies capable of accommodating new and larger sets of social network data. Considerable attention has recently been dedicated to the possibility of scaling exponential random graph models (ERGMs) - a well-established family of statistical models - for analyzing large social networks. Efficient computational methods would be highly desirable in order to extend the empirical scope of ERGM for the analysis of large social networks. We report preliminary results of a research project on the development of new sampling methods for ERGMs. We propose a new MCMC sampler and use it with Metropolis coupled Markov chain Monte Carlo, a typical scheme for MCMC parallelization. We show that, using this method, the CPU time for parameter estimation may be considerably reduced. *Generous support from the Swiss National Platform of Advanced Scientific Computing (PASC) is gratefully acknowledged.
Robinson-Rechavi Marc MS Presentation
Thursday, June 9, 2016
Garden 3A, 14:30-15:00

MS Presentation

Large-Scale Analyses of Positive Selection Using Efficient Models of Codon Evolution, Marc Robinson-Rechavi (University of Lausanne/Swiss Intitute of Bioinformatics, Switzerland)

Co-Authors: Marc Robinson-Rechavi (University of Lausanne, Switzerland); Nicolas Salamin (Swiss Institute of Bioinformatics, Switzerland)

Models of codon evolution are widely used to identify signatures of positive selection in protein coding genes. While the analysis of a single gene family usually takes less than an hour on an average computer, the detection of positive selection on genomic data becomes a computationally intensive problem. In order to support our full genome database of positive selection 'Selectome' (http://selectome.unil.ch/) we develop a series of high-performance computing tools to analyse positive selection. These improvements allow us to develop new and more realistic, but computationally tractable, models of codon evolution.
Roller Sabine MS Presentation
Thursday, June 9, 2016
Garden 3A, 11:30-11:50

MS Presentation

Assessment of Transitional Hemodynamics in Intracranial Aneurysms at Extreme Scale, Sabine Roller (University of Siegen, Germany)

Co-Authors: Sabine Roller (University of Siegen, Germany); Kent-Andre Mardal (University of Oslo, Norway)

Computational fluid dynamics (CFD) is extensively used for modelling of blood flow in intracranial aneurysms as it can help clinicians in decision for intervention, and may potentially provide information on the pathogenesis of the condition. The flow regime in aneurysms, due to low Reynolds number is mostly presumed laminar - an assumption that was challenged in recent publications that showed high frequency fluctuations in aneurysms resembling transitional flow. The present work aspires to scrutinize the issue of transition in aneurysmal hemodynamics by performing first true direct numerical simulations on aneurysms of various morphologies, with resolutions of the order of Kolmogorov scales, resulting in 1 billion cells. The results show the onset of fluctuations in flow inside aneurysm during deceleration phase of the cardiac cycle before a re-laminarization during acceleration. The fluctuations confine in the aneurysm dome suggesting the manifestation of aneurysm as an initiator of transition to turbulence.
Rood Jonathan S. Poster

Poster

CLI-01 CLAW Provides Language Abstractions for Weather and Climate Models, Jonathan S. Rood (ETH Zürich, Switzerland)

Co-Authors: Valentin Clément (Center for Climate Systems Modeling (C2SM), ETH Zurich, Switzerland)

Achieving near optimal performance on different architectures (e.g., CPUs and GPUs) with a single source code is sometimes not possible and requires refactoring of the most performance critical code towards a desired architecture. This is the essence of 'performance portability' and this situation has been observed in several cases involving numerical weather and climate models. To help alleviate this situation, we are developing a tool named CLAW whose function is to allow developers to encode architecture-specific transformations into a single Fortran source. Our tool utilizes source-to-source compiler techniques to extend the Fortran grammar in a simple manner that allows the developer to activate the necessary code transformations automatically before compilation. To generalize the capabilities of the CLAW tool, we currently consider its use in the COSMO weather model and HAMMOZ and ICON climate models.
Rossetti Giulia MS Presentation
Friday, June 10, 2016
Garden 3A, 10:00-10:15

MS Presentation

Ligand Binding to the Human Adenosine Receptor hA 2A R in Nearly Physiological Conditions, Giulia Rossetti (IAS-5/INM-9/JSC and RWTH-UKA, Germany)

Co-Authors: Ruyin Cao (Forschungszentrum Jülich, Germany); Andreas Bauer (Forschungszentrum Jülich, Germany); Paolo Carloni (Forschungszentrum Jülich, Germany)

Lipid composition may significantly affect membrane proteins function, yet its impact on the protein structural determinants is not well understood. Here we present a comparative molecular dynamics (MD) study of the human adenosine receptor type 2A (hA2AR) in complex with caffeine "a system of high neuro-pharmacological relevance" within different membrane types: POPC, mixed POPC/POPE and cholesterol-rich membranes. 0.8-μs MD simulations unambiguously show that the helical folding of the amphipathic helix 8 depends on membrane contents. Most importantly, the distinct cholesterol binding into the cleft between helix 1 and 2 stabilizes a specific caffeine-binding pose against others visited during the simulation. Hence, cholesterol presence (approximately 33%-50% in synaptic membrane in central nervous system), often neglected in X-ray determination of membrane proteins, affects the population of the ligand binding poses. We conclude that including a correct description of neuronal membranes may be very important for computer-aided design of ligands targeting hA2AR and possibly other GPCRs.
MS Summary

MS Summary

MS29 Molecular Neuromedicine: Recent Advances by Computer Simulation and Systems Biology, Giulia Rossetti (IAS-5/INM-9/JSC and RWTH-UKA, Germany)

Co-Authors: Giulia Rossetti (JSC and RWTH-UKA, Germany), Mercedes Alfonso-Prieto (University of Barcelona, Spain)

Innovative neuromedicine approaches require a detailed understanding of the molecular and systems-level causes of neurological diseases, their progression and the response to treatments. Ultimately, neuronal function and diseases are caused by exquisite molecular recognition processes during which specific biomolecules bind to each other allowing neuronal signaling, metabolism, synaptic transmission, etc. The detailed understanding of these processes, as well as the rational design of molecules for technology advances in neuropharmacology, require the characterization of neuronal biomolecules' structure, function, dynamics and energetics. The biomolecules and processes under study are inherently highly complex in terms of their size (typically on the order of 10^5-10^6 atoms) and time-scale (up to seconds), much longer than what can be simulated by standard molecular dynamics approaches (which, nowadays, can typically reach up to microseconds). This requires the development of methodologies in multiscale molecular simulation. Recent advancements include coarse-grained (CG) approaches that allow to study large systems on a long timescale, as well as very accurate hybrid methods combining QM modelling with molecular mechanics (MM) that provide descriptions of key neuronal photoreceptors, such as rhodopsin. In addition, Brownian dynamics are used to study biomolecular recognition and macromolecular assembly processes towards in vivo conditions. Such computational tools are invaluable for description, prediction and understanding of biological mechanisms in a quantitative and integrative way. This workshop will be an ideal forum to discuss both advancements and future directions in multiscale methodologies and applications on key signaling pathways in neurotransmission, such as those based on neuronal G-protein coupled receptors (GPCRs). These novel methodologies might prove to be instrumental to understand the underlying causes of brain diseases and to design new drugs aimed at their treatment.
Rossinelli Diego MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 14:30-15:00

MS Presentation

The Productivity Gap in HPC, Diego Rossinelli (ETH Zurich, Switzerland)

Co-Authors:

The rapid development of novel HPC software represents a formidable challenge at the present time. Computing hardware is evolving at a faster-than-ever pace, the timely software development for these platforms requires exceptionally high productivity. Developers are often exposed to high-risks design decisions that might irreversibly compromise the software performance on the target platform. In this talk we discuss an approach that attempts to address these issues while identifying its associated costs.
Paper
Wednesday, June 8, 2016
Auditorium C, 17:00-17:30

Paper

An Efficient Compressible Multicomponent Flow Solver for Heterogeneous CPU/GPU Architectures, Diego Rossinelli (ETH Zurich, Switzerland)

Co-Authors: Babak Hejazialhosseini (Cascade Technologies Inc., United States of America); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Diego Rossinelli (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

We present a solver for three-dimensional compressible multicomponent flow based on the compressible Euler equations. The solver is based on a finite volume scheme for structured grids and advances the solution using an explicit Runge-Kutta time stepper. The numerical scheme requires the computation of the flux divergence based on an approximate Riemann problem. The computation of the divergence quantity is the most expensive task in the algorithm. Our implementation takes advantage of the compute capabilities of heterogeneous CPU/GPU architectures. The computational problem is organized in subdomains small enough to be placed into the GPU memory. The compute intensive stencil scheme is offloaded to the GPU accelerator while advancing the solution in time on the CPU. Our method to implement the stencil scheme on the GPU is not limited to applications in fluid dynamics. The performance of our solver was assessed on Piz Daint, a XC30 supercomputer at CSCS. The GPU code is memory-bound and achieves a per-node performance of 462 Gflop/s, outperforming by 3.2x the multicore-based Gordon Bell winning CUBISM-MPCF solver for the offloaded computation on the same platform. The focus of this work is on the per-node performance of the heterogeneous solver. In addition, we examine the performance of the solver across 4096 compute nodes. We present simulations for the shock-induced collapse of an aligned row of air bubbles submerged in water using 4 billion cells. Results show a final pressure amplification that is 100x stronger than the strength of the initial shock.
Rudi Johann MS Presentation
Friday, June 10, 2016
Garden 1A, 09:00-09:30

MS Presentation

HPC Challenges Arising in Forward and Inverse Mantle Flow Simulation, Johann Rudi (The University of Texas at Austin, United States of America)

Co-Authors: Johann Rudi (University of Texas at Austin, United States of America); Vishagan Ratnaswamy (California Institute of Technology, United States of America); Dunzhu Li (California Institute of Technology, United States of America); Tobin Isaac (University of Chicago, United States of America); Michael Gurnis (California Institute of Technology, United States of America); Omar Ghattas (University of Texas at Austin, United States of America)

We discuss scalable solvers for the forward and inverse simulation of mantle flow problems. Crucial solver components for the arising nonlinear Stokes problems are parallel multigrid methods for preconditioning the linearized Stokes system, and a Schur complement approximation that is able to cope with extreme viscosity variations. To achieve good parallel scalability, we use, among others, matrix-free operations and we redistribute coarse multigrid levels to a subsets of all available processors. We will discuss the inversion of global rheology parameters and distributed fields from surface data and the present-day temperature distribution in instantaneous and time-dependent problems.
Rupp Karl MS Presentation
Thursday, June 9, 2016
Auditorium C, 14:00-14:30

MS Presentation

Challenges in Software Library Development for GPUs and MIC, Karl Rupp (ETH Zurich, Switzerland)

Co-Authors:

To leverage the computational power of many-core architectures of GPUs and Xeon Phi, suitable software library interfaces are required such that complex applications with many interacting components can be built. This talk discusses new challenges that have to be addressed in software library development for using many-core architectures in computational science: First, different programming models, most notably OpenMP, CUDA, and OpenCL, are in use. The ideal software library supports all of these approaches. Second, it is not enough to optimise for a single architecture. Different device generations provide different characteristics such as cache or scratchpad memory sizes; thus, a single optimised kernel is not sufficient. Third, extensive testing and performance monitoring require the availability of physical hardware; virtual machines are not enough. The audience will learn how these challenges are addressed in the ViennaCL library and which future directions are taken for further improvement.
Paper
Wednesday, June 8, 2016
Auditorium C, 15:30-16:00

Paper

Extreme-Scale Multigrid Components within PETSc, Karl Rupp (ETH Zurich, Switzerland)

Co-Authors: Dave A. May (ETH Zurich, Switzerland); Karl Rupp (Austria); Matthew G. Knepley (Rice University, United States of America); Barry F. Smith (Argonne National Laboratory, United States of America)

Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for highresolution simulations. The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely affected by using a coarse-level solver with sub-optimal algorithmic complexity. To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary.

In this work, we present a new software component introduced within the Portable Extensible Toolkit for Scientific computation (PETSc) which permits agglomeration. We provide an overview of the design and implementation of this functionality, together with several use cases highlighting the benefits of agglomeration. Lastly, we demonstrate via numerical experiments employing geometric multigrid with structured meshes, the flexibility and performance gains possible using our MPI-rank agglomeration implementation.
MS Summary

MS Summary

MS24 Software Libraries in Computational Science for Many-Core Architectures, Karl Rupp (ETH Zurich, Switzerland)

Co-Authors:

Many-core architectures such as graphics processing units (GPUs) and many integrated cores (MIC) emerged in scientific computing. The efficient use of these architectures, however, is very challenging from the algorithmic perspective, as fine-grained parallelism needs to be fully exposed in the software stack. However, many widely-used libraries in scientific computing today were designed for processors with only a few cores and hence struggle to provide good support for such many-core architectures. At the same time, new libraries designed especially for GPUs and MIC emerge. The speakers in this minisymposium are developers of software libraries for GPUs and MIC and explain their approaches. They discuss strategies for dealing with the subtle differences of these architectures in order to provide portable performance on the users' machines. The discussion also touches the design of new application programming interfaces which are simple enough for use by application scientists, but also provide optional control over important details for domain experts. Overall, this session presents the state-of-the-art in libraries for many-core architectures and sketches a path forward for making these architectures more accessible to a broad user base.
Rybkin Vladimir Poster

Poster

MAT-07 Nuclear Quantum Effects on Calculated Aqueous Redox Properties, Vladimir Rybkin (ETH Zurich, Switzerland)

Co-Authors:

Nuclear quantum effects neglected in most applications have been recently shown to have a significant influence on the calculated band structures of bulk liquids and solids and might thus be very significant for electrochemical properties. We have calculated thermodynamic integrals as well as vertical ionization potentials (VIP) and vertical electron affinities (VEA) for two redox pairs, CO2/CO2- and HO2/HO2-, using DFT-driven thermodynamic integration using the classical and the quantum generalized Langevin equation colored-noise thermostats. It has been found that quantum nuclear effects lead to noticeable shifts in both VIP and VEA: the former is reduced, whereas the latter is increased for both pairs. As a consequence, significant effect cancellation is observed for the thermodynamic integrals computed with the classical and the quantum thermostats.
Rüde Ulrich Poster

Poster

CSM-02 A Novel Approach for Efficient Stencil Assembly in Curved Geometries, Ulrich Rüde (Department of Computer Science 10, FAU Erlangen-Nürnberg, Germany)

Co-Authors: Marcus Mohr (Ludwig Maximilian University of Munich, Germany); Ulrich Rüde (University of Erlangen-Nuremberg, Germany); Markus Wittmann (FAU Erlangen-Nürnberg / Erlangen Regional Computing Center (RRZE), Germany); Barbara Wohlmuth (Technical University of Munich, Germany)

In many scientific and engineering applications one has to deal with curved geometries. Such domains can accurately be approximated e.g., by unstructured grids and iso-parametric finite elements. We present a novel approach here that is well-suited to our concept of hierarchical hybrid grids (HHG). The latter was shown to achieve excellent performance and scalability even for extreme numbers of DOFs by a matrix-free implementation and exploiting regularity of access patterns. In our approach FE stencils are not assembled exactly, but approximated by low order polynomials and evaluated with an efficient incremental algorithm. We demonstrate the accuracy achieved as well as the computational efficiency using our prototypical HHG-based mantle convection solver which operates on non-nested triangulations of a thick spherical shell. The implementation of our scheme is based on a systematic node-level performance analysis and maintains the high efficiency of the original HHG.
Rüdisühli Stefan MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, Stefan Rüdisühli (Atmospheric and Climate Science, ETH Zurich, Switzerland, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.
Röthlisberger Ursula MS Presentation
Friday, June 10, 2016
Garden 3A, 09:00-09:30

MS Presentation

Effect of Lipidation for G Protein Mediated Signalling, Ursula Röthlisberger (Ecole Polytechnique Fédérale Lausanne, Switzerland)

Co-Authors: Siri Camee vanKeulen (EPFL, Switzerland)

G-protein-coupled-receptor (GPCR) pathways are of high interest since their signal-transduction cascades play an important role in several diseases such as hypertension and obesity. The first proteins that will transfer an extracellular signal received by a GPCR to other regions in the cell are G protein heterotrimers. The specificity by which G protein subunits interact with receptors and effectors defines the variety of responses that a cell is capable of providing due to an extracellular signal. Interestingly, many G proteins have distinct lipidation profiles but little is known how this influences their function. Here, we investigate the effect of myristoylation on the structure and dynamics of Gαi1 and the possible implications for signal transduction. A 2 µs molecular dynamics simulation suggests conformational changes of the switch II and alpha helical domains emphasizing the importance of permanent lipid attachment in tuning the function of signaling proteins.

S

Saarinen Sami MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:30-14:45

MS Presentation

Towards Exascale Computing with the ECMWF Model, Sami Saarinen (ECMWF, United Kingdom)

Co-Authors: Nils Wedi (ECMWF, United Kingdom); George Mozdzynski (ECMWF, United Kingdom); Sami Saarinen (ECMWF, United Kingdom)

The European Centre for Medium-Range Weather Forecasts (ECMWF) is currently investing in a scalability programme that addresses computing and data handling challenges for realizing those scientific advances on future high-performance computing environments that will enhance predictive skill from medium to monthly time scales. A key component of this programme is the European Commission funded project Energy efficient SCalable Algorithms for weather Prediction at Exascale (ESCAPE) that develops numerical building blocks and compute intensive algorithms of the forecast model, applies compute/energy efficiency diagnostics, designs implementations on novel architectures, and performs testing in operational configurations. The talk will report on the progress of the scalability programme with a special focus on ESCAPE.
Sager Korbinian MS Presentation
Thursday, June 9, 2016
Garden 1A, 14:45-15:00

MS Presentation

Salvus: A Flexible Open-Source Package for Full-Waveform Modelling and Inversion, Korbinian Sager (ETH Zurich, Switzerland)

Co-Authors: Christian Boehm (ETH Zurich, Switzerland); Martin van Driel (ETH Zurich, Switzerland); Lion Krischer (Ludwig Maximilian University of Munich, Germany); Dave A. May (ETH Zurich, Switzerland); Max Rietmann (ETH Zurich, Switzerland); Korbinian Sager (ETH Zurich, Switzerland); Andreas Fichtner (ETH Zurich, Switzerland)

Within all domain-specific software projects, finding the correct balance between flexibility and performance is often difficult. In the seismic imaging community, the trend has been to move towards codes which are heavily optimised, but which often sacrifice usability and flexibility. Here we introduce Salvus: an open-source HPC high-order finite element (FE) package focused on full-waveform modelling and inversion, which is designed to be both flexible and performant. Salvus was constructed by following modern software design practices, testing protocols, and by establishing its foundations upon existing open-source high-level scientific libraries. The FE framework is generalized over spatial dimensions, time-integrators, polynomial order and wave-propagation physics, and provides support for both hexahedral and tetrahedral meshes. Additionally, support is included for various numerical optimisation methods. We discuss our usage of existing open-source scientific libraries, our choice level of abstraction, and quantitatively investigate the performance penalties associated with these abstractions.
Saha Santanu Poster

Poster

PHY-06 Soft Norm Conserving Accurate Pseudopotentials, Santanu Saha (Uinversity of Basel, Switzerland)

Co-Authors: Stefan Goedecker (University of Basel, Switzerland)

Soft and accurate pseudopotentials are necessary for prediction of new materials. With the inclusion of non-linear-core-correction (NLCC) along with semi-core states to the Goedecker pseudopotential[1-2] we generated soft and accurate pseudopotentials for Perdew, Burke, Ernzerhof (PBE)[3] functional for H to Ar and few transition-metals. Through the Delta-Test[4,5], it is found that they are good for bulk systems where the average delta value is 0.15 meV/atom. The average error in atomization-energy of G2-1 Test Set is 1.32 Kcal/mol obtained using NWChem with aug-cc-pV5Z basis set. [1] C. Hartwigsen, S. Goedecker, and J. Hutter, Phys. Rev. B 58, 3641 (1998) [2] A. Williand et al. , J. Chem. Phys. 138, 104109 (2013) [3] P. Perdew, K. Burke, and M. Ernzerhof, Phys. Rev. Lett. 77 , 3865 (1996) [4] K. Lejaeghere et al. Critical Reviews in Solid State and Materials Sciences 39, 1-24 (2014) [5] K. Lejaeghere et al. Science (351), 6280 (2016)
Salamin Nicolas MS Presentation
Friday, June 10, 2016
Garden 3C, 09:30-10:00

MS Presentation

Efficient Approaches to Model Evolution in Computational Biology, Nicolas Salamin (University of Lausanne/Swiss Intitute of Bioinformatics, Switzerland)

Co-Authors:

Biological modelling has become an important approach to study the evolution of genes and organisms. However, the increase in availability of large-scale genomic data is pushing for the development of high-performance computing approaches to model evolutionary processes. In this context, the availability of Bayesian approaches has been essential to extend the realism of the evolutionary models that can be used. The development of these methods based on Markov chain Monte Carlo (MCMC) techniques is however computationally intensive and requires high-performance computing approaches to deal with the computational complexity. Here, I will present our recent developments to optimise and parallelize MCMC techniques. I will also discuss our efforts to extend modelling of adaptation from genomic data to complex phenotypic traits using hierarchical Bayesian computations.
MS Presentation
Thursday, June 9, 2016
Garden 3A, 14:30-15:00

MS Presentation

Large-Scale Analyses of Positive Selection Using Efficient Models of Codon Evolution, Nicolas Salamin (University of Lausanne/Swiss Intitute of Bioinformatics, Switzerland)

Co-Authors: Marc Robinson-Rechavi (University of Lausanne, Switzerland); Nicolas Salamin (Swiss Institute of Bioinformatics, Switzerland)

Models of codon evolution are widely used to identify signatures of positive selection in protein coding genes. While the analysis of a single gene family usually takes less than an hour on an average computer, the detection of positive selection on genomic data becomes a computationally intensive problem. In order to support our full genome database of positive selection 'Selectome' (http://selectome.unil.ch/) we develop a series of high-performance computing tools to analyse positive selection. These improvements allow us to develop new and more realistic, but computationally tractable, models of codon evolution.
MS Summary

MS Summary

MS27 CADMOS: HPC Simulations, Modeling and Large Data, Nicolas Salamin (University of Lausanne/Swiss Intitute of Bioinformatics, Switzerland)

Co-Authors: Nicolas Salamin (University of Lausanne, Switzerland), Jan Hesthaven (EPFL, Switzerland)

CADMOS (Center for ADvance MOdelling Science) is a partnership between UNIGE, UNIL and EPFL whose goal is to promote HPC, modelling and simulation techniques, and data science for a broad range of relevant applications. New scientific results for well established HPC problems, or new methodological approaches to problems usually not solved by computer modelling or HPC resources are especially considered. In this minisymposium we will have presentations from each of the three partners, highlighting the above goals. We will also invite two external keynote speakers. Contributions reporting on the link between HPC and data science, or opening the door to new interdisciplinary applications within the scope of CADMOS are welcome.
Salsac Anne-Virginie MS Presentation
Wednesday, June 8, 2016
Garden 2A, 13:00-13:30

MS Presentation

Numerical Simulation of the Dynamics of Non-Spherical Microcapsules, Anne-Virginie Salsac (CNRS - Université de Technologie de Compiègne, Switzerland)

Co-Authors:

Capsules consist of an internal medium enclosed by a semi-permeable membrane which protects it and controls its exchanges with the environment. Capsules exist in nature under the form of cells or eggs; artificial microcapsules are widely used in industry to protect active substances, aromas or flavors and control their targeted release. In most situations, capsules are suspended into another flowing liquid and are subjected to hydrodynamic forces. One robust method to model the three-dimensional fluid-structure interactions consists in coupling a boundary integral method (for the internal and external fluid motion) with a finite element method (for the membrane deformation), which we have shown to be stable and accurate. We will review how numerical models have provided insights into the dynamics of an ellipsoidal capsule in simple shear flow. We will determine which regimes are mechanically stable and correlate the results with experimental studies on artificial capsules and red blood cells.
Sanan Patrick Paper
Wednesday, June 8, 2016
Auditorium C, 15:30-16:00

Paper

Extreme-Scale Multigrid Components within PETSc, Patrick Sanan (USI, Switzerland)

Co-Authors: Dave A. May (ETH Zurich, Switzerland); Karl Rupp (Austria); Matthew G. Knepley (Rice University, United States of America); Barry F. Smith (Argonne National Laboratory, United States of America)

Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for highresolution simulations. The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely affected by using a coarse-level solver with sub-optimal algorithmic complexity. To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary.

In this work, we present a new software component introduced within the Portable Extensible Toolkit for Scientific computation (PETSc) which permits agglomeration. We provide an overview of the design and implementation of this functionality, together with several use cases highlighting the benefits of agglomeration. Lastly, we demonstrate via numerical experiments employing geometric multigrid with structured meshes, the flexibility and performance gains possible using our MPI-rank agglomeration implementation.
Santoro Mauro MS Summary

MS Summary

MS06 Software Engineering Meets Scientific Computing: Generality, Reusability and Performance for Scientific Software Platforms I: Engineering Methodologies and Development Processes, Mauro Santoro (Università della Svizzera Italiana, Switzerland)

Co-Authors: Alessandro Margara (Università della Svizzera Italiana, Switzerland)

Software platforms for modelling and simulation of scientific problems are becoming increasingly important in many fields and often drive the scientific discovery process. These platforms present unique requirements in terms of functionalities, performance and scalability, which limit the applicability of consolidated software engineering practices for their design, implementation and validation. For instance, since the effectiveness of a software platform for scientific simulation strictly depends on the level of performance and scalability it can achieve, the design, development and optimization of the platform are usually tailored to the specific hardware architecture the platform is expected to run on. Similarly, when a scientific simulation requires the integration of multiple software platforms, such integration is typically customized for the specific simulation problem at hand. Because of this, developing and integrating scientific computing platforms demands for a significant amount of relevant knowledge about the modeled domain and the software and hardware infrastructures used for simulation. This information typically remains hidden in the implementation details of a specific solution and cannot be easily reused to port the simulation to different hardware infrastructures or to implement or integrate different simulation platforms on the same hardware infrastructure. The Software Engineering for Scientific Computing (SESC) minisymposium is concerned with identifying suitable engineering processes to design, develop, integrate and validate software platforms for scientific modelling and simulations. This introduces challenges that require the expertise of researchers working in different areas, including computational scientists to model scientific problems, software engineers to propose engineering methodology and HPC experts to analyze platform dependent performance requirements that characterize simulations. The goal of the SESC minisymposium is to bring together software engineers, computational scientists and HPC experts to discuss and advance the engineering practices to implement platforms for scientific computing, aiming to reduce the development time, increase the reusability, the maintainability and the testability of the platforms, while offering the level of performance and scalability that is required by the simulation scenarios at hand. Specifically, the Software Engineering for Scientific Computing (SESC) minisymposium aims to address two conflicting requirements in the definition of an effective software development process: 1) promoting generality and reusability of software components, to simplify maintenance, evolution, adaptation and porting of software platforms 2) defining solution that guarantee an adequate level of performance and scalability, which is of paramount importance in scientific simulations. The SESC minisymposium is organized around two sessions: this first session focuses more specifically on design methodologies and development processes for general and reusable code; the second session (Part 2) targets the requirements of performance and scalability in scientific software platforms.
MS Summary

MS Summary

MS12 Software Engineering Meets Scientific Computing: Generality, Reusability and Performance for Scientific Software Platforms II: Performance and Scalability Requirements, Mauro Santoro (Università della Svizzera Italiana, Switzerland)

Co-Authors: Alessandro Margara (Università della Svizzera Italiana, Switzerland)

Software platforms for modelling and simulation of scientific problems are becoming increasingly important in many fields and often drive the scientific discovery process. These platforms present unique requirements in terms of functionalities, performance and scalability, which limit the applicability of consolidated software engineering practices for their design, implementation and validation. For instance, since the effectiveness of a software platform for scientific simulation strictly depends on the level of performance and scalability it can achieve, the design, development and optimization of the platform are usually tailored to the specific hardware architecture the platform is expected to run on. Similarly, when a scientific simulation requires the integration of multiple software platforms, such integration is typically customized for the specific simulation problem at hand. Because of this, developing and integrating scientific computing platforms demands for a significant amount of relevant knowledge about the modeled domain and the software and hardware infrastructures used for simulation. This information typically remains hidden in the implementation details of a specific solution and cannot be easily reused to port the simulation to different hardware infrastructures or to implement or integrate different simulation platforms on the same hardware infrastructure. The Software Engineering for Scientific Computing (SESC) symposium is concerned with identifying suitable engineering processes to design, develop, integrate and validate software platforms for scientific modelling and simulations. This introduces challenges that require the expertise of researchers working in different areas, including computational scientists to model scientific problems, software engineers to propose engineering methodology and HPC experts to analyze platform dependent performance requirements that characterize simulations. The goal of the SESC minisymposium is to bring together software engineers, computational scientists and HPC experts to discuss and advance the engineering practices to implement platforms for scientific computing, aiming to reduce the development time, increase the reusability, the maintainability and the testability of the platforms, while offering the level of performance and scalability that is required by the simulation scenarios at hand. Specifically, the Software Engineering for Scientific Computing (SESC) minisymposium aims to address two conflicting requirements in the definition of an effective software development process: 1) promoting generality and reusability of software components, to simplify maintenance, evolution, adaptation and porting of software platforms 2) defining solution that guarantee an adequate level of performance and scalability, which is of paramount importance in scientific simulations. The SESC minisymposium is organized around two sessions: this session targets the requirements of performance and scalability in scientific software platforms; the other session (Part 1) focuses more specifically on design methodologies and development processes for general and reusable code.
Sato Mitsuhisa MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:00-14:15

MS Presentation

Omni Compiler and XcodeML: An Infrastructure for Source-to-Source Transformation, Mitsuhisa Sato (RIKEN AICS, Japan)

Co-Authors: Hitoshi Murai (AICS, RIKEN, Japan); Masahiro Nakao (AICS, RIKEN, Japan); Hidetoshi Iwashita (AICS, RIKEN, Japan); Jinpil Lee (AICS, RIKEN, Japan); Akihiro Tabuchi (University of Tsukuba, Japan)

We have been developing a compiler of PGAS programming language called XcalableMP for post-petascale computing. XcalableMP is a directive-based language extension of Fortran95 and C for scientific programming for high-performance distributed memory parallel systems. Omni Compiler is an infrastructure for source-to-source transformation to design source-to-source compilers such as Omni XcalableMP compiler. It includes C and Fortran95 front-ends which translate a source code to XML-based intermediate code called XcodeML, a Java-based code-transformation library on XcodeML, and the de-compilers which translate XcodeML intermediate code back to transformed source code. Currently, the Omni compiler also supports the code transformation for OpenMP and OpenACC. In this talk, we will present internals of Omni compiler by taking Omni XcalableMP compiler as a case study, and our future plan.
Sawley Marie-Christine MS Presentation
Thursday, June 9, 2016
Garden 3B, 11:30-11:45

MS Presentation

Large Scale Monitoring Data Analytics, Marie-Christine Sawley (Intel, Switzerland)

Co-Authors:

Operating large computing systems reliably and efficiently requires to implement fine grain monitoring and tuning policies in order to contain operational costs while increasing reliability. Data collected during operations, to be used on the fly, or to be analysed offline represent a wealth of information to be exploited. One particularly interesting area is near real-time predictive analysis that would enable to swiftly take corrective action in case of excursions or impending component failures, with minimal interruptions of service. Moving and integrating data originating from heterogeneous sources however presents several challenges in the development of solutions to automate the process. The majority of data sets originates from specialized storage systems, log files, environmental data and relational databases. With hierarchical data analysis and enough data monitoring at hand, predictive fault models based on machine learning can be developed at scale.
Sawyer William MS Presentation
Thursday, June 9, 2016
Garden 2A, 14:30-15:00

MS Presentation

Causality Inference in a Nonstationary and Nonhomogenous Framework, William Sawyer (Swiss National Supercomputing Centre (CSCS), Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland); Lukas Pospisil (Università della Svizzera italiana, Switzerland)

The project deploys statistical and computational techniques to develop a novel approach to causality inference in multivariate time-series of economical data on equity and credit risks. The methods build on recent research of project participants. They improve on classical approaches to causality analysis by accommodating general forms of non-stationarity and non-homogeneity resulting from unresolved and latent scale effects. Emerging causality framework results in and is implemented through a clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. We are using finite element framework to propose a numerical scheme. One of the most challenging components of the emerging HPC implementation is a Quadratic Programming problem with linear equality and bound inequality constraints. We compare different algorithms and demonstrate the efficiency solving practical benchmark problems.
Poster

Poster

CSM-13 Towards the HPC-Inference of Causality Networks from Multiscale Economical Data, William Sawyer (Swiss National Supercomputing Centre (CSCS), Switzerland)

Co-Authors: Illia Horenko (Università della Svizzera italiana, Switzerland); Patrick Gagliardini (Università della Svizzera italiana, Switzerland); William Sawyer (ETH Zurich / CSCS, Switzerland)

The novel non-stationary approach to causality inference of multivariate time-series was proposed during the recent research of project participants. This methodology uses the clustering based on a minimization of the averaged clustering functional, which describes the mean distance between observation data and its representation in terms of given number of abstract Bayesian causality models of a certain predefined class. For analysis of realistic datasets we develop HPC library that is built on top of PETSc and that implements MPI, OpenMP, and CUDA parallelization strategies. We present the mathematical aspects of the methodology and preliminary results of solving the non-stationary problem of causality inference for multivariate economic data with our HPC approach. The results are computed on Piz Daint at CSCS.
MS Summary

MS Summary

MS03 Code Generation Techniques for HPC Earth Science Applications, William Sawyer (Swiss National Supercomputing Centre (CSCS), Switzerland)

Co-Authors:

Earth Science simulations share key characteristics: There is a constant drive to higher resolutions while simultaneously incorporating ever-more sophisticated descriptions of the underlying physical processes. This trend results in a dramatic increase in the computational requirements. To support this trend, more sophisticated numerical techniques are required along with a drastic increase in computing power from emerging architectures, such as clusters of Graphics Processing Units (GPUs). Both aspects of this duality imply increased programming complexity. The difficulty does not stop there, the useful life of Earth Science applications should span several generations of hardware technology, and, moreover, new software development often extends existing legacy Fortran or C++ code. These challenges are now usually addressed by adding onto existing code: its size increases with the algorithmic complexity and new physical descriptions, parallelism is incorporated by using message-passing and/or multithreading libraries, such as MPI or pthreads, or by essentially extending the language with parallelization directives, such as OpenMP for CPUs or Intel Xeon Phi or OpenACC for GPUs. Codes targeting multiple architectures either have multiple implementations, or use extensive pre-processing macros controlled by compilation flags. The domain scientist usually has to manage multiple disciplines, including numerical analysis and high performance programming. The resulting code becomes unreadable for anyone but the developer, meaning that software maintenance is intractable. Drawing on evidence from a wide spectrum of Earth Science applications, we feel it is intuitive that new tools describing numerics and parallelism at a high level of abstraction are needed to meet these challenges of increasing complexity. This minisymposium covers a spectrum of code generation approaches. Their over-arching goal is to separate the concerns of domain science, underlying numerical techniques and high performance computing (i.e., Computer Science), so that specialists can put their expertise to best use. We consider software frameworks, such as FireDrake and FEniCS, for the high-level specification of partial differential equations, which can generate code optimized for parallel CPU and GPU clusters. Such problem-solving frameworks are valuable for the scientist without HPC background who would like to formulate a numerical solution, which then runs optimally on a target architecture. Next, libraries for stencil calculations, such as GridTools, can help the developer isolate computation intensive kernels, which can be written in a high-level hardware-oblivious fashion making use of C++ template meta-programming. For pre-existing applications written in other languages, such as Fortran, GridTools offers an interfacing layer to take care of data management. Moreover, a Python environment is presented that can generate GridTools-compliant kernels. Source-to-source compilers, such as Omni, allow for translation of code, and can be used for domain specific extensions of existing languages, such as in the CLAW project. Finally, several use cases and numerical examples, such as the preconditioned conjugate gradient solver or the discontinuous Galerkin method, are present to illustrate the usability of these new tools.
Schaller Matthieu Paper
Wednesday, June 8, 2016
Auditorium C, 13:30-14:00

Paper

SWIFT: Using Task-Based Parallelism, Fully Asynchronous Communication, and Graph Partition-Based Domain Decomposition for Strong Scaling on more than 100 000 Cores, Matthieu Schaller (ICC, Durham University, United Kingdom)

Co-Authors: Pedro Gonnet (Durham University, United Kingdom); Aidan B. G. Chalk (Durham University, United Kingdom); Peter Draper (Durham University, United Kingdom)

We present a new open-source cosmological code, called SWIFT, designed to solve the equations of hydrodynamics using a particle-based approach (Smooth Particle Hydrodynamics) on hybrid shared / distributed-memory architectures. SWIFT was designed from the bottom up to provide excellent strong scaling on both commodity clusters (Tier-2 systems) and Top100-supercomputers (Tier-0 systems), without relying on architecture-specific features or specialized accelerator hardware. This performance is due to three main computational approaches: (i) Task-based parallelism for shared-memory parallelism, which provides fine-grained load balancing and thus strong scaling on large numbers of cores; (ii) Graph-based domain decomposition, which uses the task graph to decompose the simulation domain such that the work, as opposed to just the data, as is the case with most partitioning schemes, is equally distributed across all nodes; (iii) Fully dynamic and asynchronous communication, in which communication is modelled as just another task in the task-based scheme, sending data whenever it is ready and deferring on tasks that rely on data from other nodes until it arrives.

In order to use these approaches, the code had to be re-written from scratch, and the algorithms therein adapted to the task-based paradigm. As a result, we can show upwards of 60% parallel efficiency for moderate-sized problems when increasing the number of cores 512-fold, on both x86-based and Power8-based architectures.
Scheidegger Simon MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:15-15:30

MS Presentation

Solving High-Dimensional Dynamic Stochastic Economies with Active Subspaces and Gaussian Processes, Simon Scheidegger (University of Zurich/Stanford University, Switzerland)

Co-Authors: Ilias Bilionis (Purdue University, United States of America)

We show how active subspace methods, in conjunction with Gaussian processes and parallel computing can be used to approximate equilibria in heterogeneous agent macro-models. Using recent advances in approximation theory, we apply a combination of Gaussian processes and active subspace techniques to approximate policy functions in models with at least 100 continuous states. Moreover, we show that our method is perfectly suited for dynamic programming and time iteration on non-cubic geometries such as simplices.
MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:00-15:15

MS Presentation

Computing Equilibria in Dynamic Stochastic Macro-Models with Heterogeneous Agents, Simon Scheidegger (University of Zurich/Stanford University, Switzerland)

Co-Authors: Felix Kubler (University of Zurich, Switzerland); Simon Scheidegger (University of Zurich & Stanford University, Switzerland)

We show how sparse grid interpolation methods in conjunction with parallel computing can be used to approximate equilibria in overlapping generation (OLG) models with aggregate uncertainty. In such models, the state of the economy can be characterized by the wealth distribution across generations/cohorts of the population. To approximate the function mapping this state into agents' investment decisions and market prices, we use piece-wise multi-linear hierarchical basis functions on (adaptive) sparse grids. When solving for the recursive equilibrium function, we combine the adaptive sparse grid with a time iteration procedure resulting in an algorithm that is massively parallelisable. Our implementation is hybrid-parallel and can solve OLG models with large (depreciation) shocks and with 60 continuous state variables.
Poster

Poster

EMD-01 Parallelized Dimensional Decomposition for Dynamic Stochastic Economic Models, Simon Scheidegger (University of Zurich/Stanford University, Switzerland)

Co-Authors: Olaf Schenk (Università della Svizzera italiana, Switzerland); Simon Scheidegger (University of Zurich & Stanford University, Switzerland)

This project explores a technique called High-Dimensional Model Representation (HDMR), which allows for the decomposition of a function into a finite number of lower-dimensional component functions. HDMR leverages the lack of input correlation to effectively reduce dimensionality of the problem in exchange for accuracy. Due to the intrinsic separability and hierarchical construction, HDMR provides the opportunity for both embarrassingly parallel model estimation and adaptive selection of active or significant inputs. An application of HDMR in conjunction with Adaptive Sparse Grids is shown in the context of computational economics, in which we provide an efficient solution method for high-dimensional dynamic stochastic models. Our results show that HDMR can effectively capture model dynamics with relatively low-dimensional component functions, thus mitigating the so-called "curse of dimensionality" and allowing for computability of larger systems.
MS Summary

MS Summary

MS18 Computational Economics, Simon Scheidegger (University of Zurich/Stanford University, Switzerland)

Co-Authors: Johannes Brumm (University of Zurich, Switzerland)

This minisymposium provides an overview of recent developments in how computational methods are applied to economic problems. These include, for instance, the inference of causality relations from large data-sets. Another example and focus of this symposium is the solution, estimation, and uncertainty quantification of dynamic stochastic economic models in fields like optimal taxation, asset pricing, or climate change. Solving such models is particularly challenging because of the feedback from the future that the expectations of the modeled economic agents create. This feature combined with the substantial heterogeneity that successful models of economic phenomena have to incorporate often results in dynamical systems with high-dimensional state spaces, confronting economists with the curse of dimensionality. Methods to alleviate this curse include adaptive sparse grids and active subspace methods. Moreover, such problems often require substantial computation time even if an efficient solution method is applied. Fortunately, the generic structure of many of these problems allows for massive parallelization and is thus a perfect application of modern high-performance computing techniques. This minisymposium brings together recent developments along those lines.
Schenk Olaf Poster

Poster

CSM-01 An Interior-Point Stochastic Approximation Method on Massively Parallel Architectures, Olaf Schenk (Università della Svizzera italiana, Switzerland)

Co-Authors: Olaf Schenk (Università della Svizzera italiana, Switzerland); Drosos Kourounis (Università della Svizzera italiana, Switzerland)

The stochastic approximation method is behind the solution of many actively-studied problems in PDE-constrained optimization. Despite its far-reaching applications, there is almost no work on applying stochastic approximation and interior-point optimization, although IPM are particularly efficient in large-scale nonlinear optimization due to their attractive worst-case complexity. We present a massively parallel stochastic IPM method and apply it to stochastic PDE problems such as boundary control and optimization of complex electric power grid systems under uncertainty.
Poster

Poster

EMD-01 Parallelized Dimensional Decomposition for Dynamic Stochastic Economic Models, Olaf Schenk (Università della Svizzera italiana, Switzerland)

Co-Authors: Olaf Schenk (Università della Svizzera italiana, Switzerland); Simon Scheidegger (University of Zurich & Stanford University, Switzerland)

This project explores a technique called High-Dimensional Model Representation (HDMR), which allows for the decomposition of a function into a finite number of lower-dimensional component functions. HDMR leverages the lack of input correlation to effectively reduce dimensionality of the problem in exchange for accuracy. Due to the intrinsic separability and hierarchical construction, HDMR provides the opportunity for both embarrassingly parallel model estimation and adaptive selection of active or significant inputs. An application of HDMR in conjunction with Adaptive Sparse Grids is shown in the context of computational economics, in which we provide an efficient solution method for high-dimensional dynamic stochastic models. Our results show that HDMR can effectively capture model dynamics with relatively low-dimensional component functions, thus mitigating the so-called "curse of dimensionality" and allowing for computability of larger systems.
Poster

Poster

CSM-07 Estimation of Drag and Lift Coefficients for Steady State Incompressible Flow of a Newtonian Fluid on Domains with Periodic Roughness, Olaf Schenk (Università della Svizzera italiana, Switzerland)

Co-Authors: Drosos Kourounis (Università della Svizzera italiana, Switzerland); Olaf Schenk (Università della Svizzera italiana, Switzerland)

Rough boundaries impose several challenges to fluid simulations. Their difficulty stems from the fact that resolving the small scale rough geometry requires significantly refined meshes in the vicinity of the boundaries. Since all physical rough boundaries have a characteristic length, scale corrections on the standard Navier-Stokes equations can be obtained by considering Taylor expansions around the rough surface, leading to modified boundary conditions. Numerical tests are presented to validate the proposed theory including the calculation of drag and lift coefficients for laminar flow around a cylinder with rough boundary. Key-words: steady state Navier-Stokes equations, periodic rough boundaries, drag and lift coefficients, laminar flow.
Poster

Poster

EMD-02 Large Scale Xeon Phi Parallelization of a Deep Learning Language Model, Olaf Schenk (Università della Svizzera italiana, Switzerland)

Co-Authors: Tim Dettmers (Università della Svizzera italiana, Switzerland); Olaf Schenk (Università della Svizzera italiana, Switzerland)

Deep learning is a recent predictive modelling approach which yields near-human performance on a range of tasks. Deep learning language models have gained popularity as they achieved state-of-the-art results in many language tasks, such as language translation, but are computationally intensive thus requiring computers with accelerators and weeks of computation time. Here we propose a parallel algorithm for running deep learning language models on hundreds of nodes equipped with Xeon Phis to reduce the computation time to mere hours. We use MPI for the parallelization among nodes and use Xeon Phis to accelerate the matrix multiplications which make up more than 75% of the total computation. With our algorithm experimentation can be done much faster thus enabling rapid progress in the sparsely explored domain of natural language understanding.
Schiffmann Florian Poster

Poster

MAT-06 Linear Scaling Ehrenfest Molecular Dynamics, Florian Schiffmann (Victoria University, Australia)

Co-Authors: Florian Schiffmann (Victoria University, Australia); Joost VandeVondele (ETH Zurich, Switzerland)

With the available computational power growing, ever larger systems can be investigated with increasingly advanced methods and new algorithms. For electronic structure calculations on systems containing a few thousand atoms, linear scaling algorithms are essential. For ground state DFT calculations, linear scaling has already been demonstrated for millions of atoms in the condensed phase [J. VandeVondele, U. Bortnik, J. Hutter, 2012]. Here, we extend this work to electronically excited states, for example, to make UV/VIS spectroscopy or investigations of the electron injection process in dye-sensitized solar cells possible. We base our approach on non-adiabatic molecular dynamics, in particular on Ehrenfest molecular dynamics (EMD). The formalism, based on the density matrix, allows for linear scaling based on the sparsity of the density matrix and naturally incorporates density embedding methods such as the Kim-Gordon approach.
Schmidli Juerg Poster

Poster

CLI-02 Climate Change Simulations at Kilometer-Scale Resolution, Juerg Schmidli (Goethe Universität Frankfurt am Main, Germany)

Co-Authors: Juerg Schmidli (Goethe University of Frankfurt, Germany); Christoph Schär (ETH Zurich, Switzerland)

The recent increase of the computational power enables running of climate simulations at the kilometer-scale resolution. This modeling approach is able to explicitly resolve deep convection (i.e., thunderstorms and rain showers), and thus allows reducing some of the key uncertainties in current climate models. Here we present analyses of decade-long climate change simulations at horizontal resolution of 2.2km across a greater Alpine region. The simulations have been conducted on a Cray XE6 system using a setup with 2000 cores on a computation mesh of 500x500x60 grid points. The results show great improvement in the simulation of summer precipitation, and demonstrate the importance of kilometer-scale resolution in climate change projections.
Schneider Tobias M. MS Summary

MS Summary

MS05 High-Performance Computing in Fluid Mechanics I, Tobias M. Schneider (EPFL, Switzerland)

Co-Authors: Francois Gallaire (EPFL, Switzerland)

Large-scale computer simulations have become an indispensable tool in fluid dynamics research. Elucidating the fundamental flow physics or designing and optimizing flows for applications ranging all the way from low Reynolds number multiphase flows at small length scales to fully developed turbulence at large scales requires state-of-the art simulation capabilities. The need for large-scale HPC simulations in fluids dynamics has therefore been driving the development of novel simulation algorithms and of optimized software infrastructure. The envisioned PASC 16 symposium will bring together both developers and users of modern simulation tools. The set of talks will showcase a wide range of fields in which HPC computations are essential, highlight recent advances in computational methods and provide a platform to identify challenges and opportunities for future research.
MS Summary

MS Summary

MS14 High-Performance Computing in Fluid Mechanics II, Tobias M. Schneider (EPFL, Switzerland)

Co-Authors: Francois Gallaire (EPFL, Switzerland)

Large-scale computer simulations have become an indispensable tool in fluid dynamics research. Elucidating the fundamental flow physics or designing and optimizing flows for applications ranging all the way from low Reynolds number multiphase flows at small length scales to fully developed turbulence at large scales requires state-of-the art simulation capabilities. The need for large-scale HPC simulations in fluids dynamics has therefore been driving the development of novel simulation algorithms and of optimized software infrastructure. The envisioned PASC 16 symposium will bring together both developers and users of modern simulation tools. The set of talks will showcase a wide range of fields in which HPC computations are essential, highlight recent advances in computational methods and provide a platform to identify challenges and opportunities for future research.
Schuermann Felix MS Presentation
Friday, June 10, 2016
Garden 2BC, 09:40-10:00

MS Presentation

Large-Scale Detailed Neuron Modeling and Simulation, Felix Schuermann (EPFL, Switzerland)

Co-Authors:

This talk will introduce the abstractions and simulation techniques for neural tissue simulations capturing the detailed morphology of nerve cells. Models of this type are characterized by low arithmetic intensity and potentially large memory footprints. The simulator of choice for these type of models is NEURON. Here we will present recent achievements in substantially reducing the memory consumption of the simulator as well as the scaling to large scale supercomputers such as the Juelich JuQUEEN or the Argonne MIRA system.
MS Summary

MS Summary

MS28 Level of Detail in Brain Modeling: Common Abstractions and their Scientific Use, Felix Schuermann (EPFL, Switzerland)

Co-Authors: Felix Schuermann (EPFL, Switzerland)

Brain simulation has been a modelling challenge to the same degree as is has been a simulation challenge in the sense that depending on the scope of the question the actual mathematical formalisms vary profoundly. At the same time, the decision for a certain scope and formalism is taken in light of the available data and computational tractability. Traditionally, researchers wanting to understand the function of brains accordingly have chosen a more "top-down" approach, trying to keep complexity to the minimum. Researchers interested in understanding the brain as a system (e.g., needed for diseases) have little choice other than to embrace more "bottom-up" approaches that incorporate the biophysical and even the biochemical diversity found in brain tissue. More recently, the steady increase of computational capabilities as described in Moore's law has reached levels that large scale and fine detail are achievable at the same time. Modern informatics workflows and technologies help us to make complex scientific team efforts more tractable and reproducible. Together with high-quality, brain-wide data sets at increasing resolution and specificity, brain simulation finally is on a journey that should make it possible to overcome the divide. The minisymposium highlights this exciting convergence and the two prominent abstractions to brain modelling and simulation. The first one stops at the resolution of individual nerve cells (point neuron modelling) whereas the second takes the detailed morphologically of neurons and their circuitry into account. We present two major simulation tools, NEST and NEURON, which are open source and community standards for their respective abstraction since many years. Since about a decade these tools are capable of running on massively parallel computers. Recently, they have been shown to be ready to exploit the class of petascale HPC machines. The minisymposium presents the computational characteristics and requirements of the codes. For both abstractions, we showcase specific neuroscience applications using the respective tools and representing cutting edge in silico neuroscientific research. The presenting researchers are members of the European Human Brain Project. The presented simulators are partially supported by that effort in order to integrate them into a novel research infrastructure for brain research. One contributed talk on point neuron modelling and simulation and another on detailed neuron modelling and simulation are presented.
Schuett Ole Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:10-11:30

Contributed Talk

Performance Improvement by Exploiting Sparsity for MPI Communication in Sparse Matrix-Matrix Multiplication, Ole Schuett (ETH, Switzerland)

Co-Authors: Joost VandeVondele (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland)

DBCSR is the sparse matrix library at the heart of the CP2K linear scaling electronic structure theory algorithm. It is MPI and OpenMP parallel, and can exploit accelerators. The multiplication algorithm is based on Cannon's algorithm, whose scalability is limited by the MPI communication time. The implementation is based on MPI point-to-point communications. We present an improved implementation that takes in account the sparsity of the problem in order to reduce the communication. This implementation makes use of one-sided communications. Performance results for representative CP2K benchmarks will be also presented.
Poster

Poster

MAT-09 Sparse Matrix Multiplication Library for Linear Scaling DFT Calculations in Electronic Structure Codes, Ole Schuett (ETH, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Andreas Glöss (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

The key operation for linear scaling DFT implemented in the CP2K quantum chemistry program is sparse matrix-matrix multiplication. For such a task, the sparse matrix library DBCSR (Distributed Block Compressed Sparse Row) has been developed. DBCSR takes full advantage of the block-structured sparse nature of the matrices for efficient computation and communication. It is MPI and OpenMP parallelized, and can exploit accelerators. We describe a strategy to improve DBCSR performance. DBCSR is available as a stand alone library at http://dbcsr.cp2k.org/ to be employed in electronic structure codes. To this end a streamlined API has been defined and a suite of tools has been developed to generate the full documentation of the library (API-DOC) by extracting the information provided directly in the source code. We give a flavour of the generated API-DOC by showing snapshots of selected HTML documentation pages and we sketch the design of such tools.
Poster

Poster

MAT-04 CP2K within the PASC Materials Network, Ole Schuett (ETH, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Hans Pabst (Intel Semiconductor AG, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

One of the goals of the PASC project is to strengthen the networking in the Swiss material science community through active development of collaborative relationships among University researchers and CSCS staff. This includes assisting researchers in tuning, debugging, optimizing, and enhancing codes and applications for HPC resources, from mid-scale to national and international petascale facilities, with a view to the exascale transition. In addition, the application support specialists provide support for development projects on software porting techniques, parallelization and optimization strategies, deployment on diverse computational platforms, and data management. Here we present selected tools and software developed for CP2K [1]. Furthermore we show exemplary how a CP2K application can be tuned to optimally use all available HPC resources. With a view to the next-generation HPC hardware, we present first promising performance results for INTEL's Broadwell-EP and KNL platform. [1] The CP2K developers group, CP2K is freely available from: https://www.cp2k.org/, 2016
Schulthess Thomas MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:00-14:30

MS Presentation

Path to Exascale Computing: Can We Get Serious About Cloud-Resolving Global Models?, Thomas Schulthess (Institute for Theoretical Physics, ETH Zurich, Switzerland)

Co-Authors:

With the US Department of Energy's Exascale Computing Project ramping up, we now have some visibility of what supercomputing platforms will looks like at the beginning of the next decades. We will look at opportunities and challenges for the climate and numerical weather predictions ecosystem to run on these future platforms. Specifically, we will discuss the performance improvement needed in oder to run productive simulations with kilometre-scale resolution, and what we can expect from exascale computers.
MS Presentation
Wednesday, June 8, 2016
Garden 3B, 13:30-14:00

MS Presentation

Translating Python into GridTools: Prototyping PDE Solvers Using Stencils, Thomas Schulthess (Institute for Theoretical Physics, ETH Zurich, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

The fast-paced environment of high-performance computing architectures has always been a challenge for complex codes. First, the effort to adapt the code to new processor architectures is significant compared to their typical release phase. Second, optimisations for one target often incur performance penalties on others. Third, such codes are generally developed by domain scientists, which typically lack the expertise about specific details of the target platform. Successful projects like STELLA have shown that a way out of this situation is to apply the concept of separation of concerns. GridTools is pushing this concept even further: The domain scientist's work is conducted within a prototyping environment using a domain-specific language (DSL), while the computer scientist profiles the automatically-generated code over diverse architectures, implemented by different hardware-specific backends. This talk will give an overview of the GridTools ecosystem, highlighting the use of the prototyping environment in combination with the automatic-code generation engine.
Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 12:10-12:30

Contributed Talk

The GridTools Libraries for the Solution of PDEs Using Stencils, Thomas Schulthess (Institute for Theoretical Physics, ETH Zurich, Switzerland)

Co-Authors: Mauro Bianco (ETH Zurich / CSCS, Switzerland); Paolo Crosetto (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Thomas Schulthess (ETH Zurich, Switzerland)

Numerical weather prediction and climate models like COSMO and ICON solve explicitly a large set of PDEs. The STELLA library was successfully used to port the dynamical core of COSMO providing a performance portable code across multiple platforms. A significant performance speedup was obtained for NVIDIA GPUs as reported in doi>10.1145/2807591.2807676. However its applicability was restricted to only cartesian structured grids, finite difference methods, and is difficult to be used outside the COSMO model. The GridTools project emerges as an effort to provide an ecosystem for developing portable and efficient grid applications for the explicit solution of PDEs. GridTools generalizes STELLA to a wider class of weather and climate models on multiple grids: Cartesian and spherical, and offers facilities for performing communication and setting boundary conditions. Here we present the GridTools API and show performance on NVIDIA GPUs and x86 platforms.
Schutt Kristof MS Presentation
Wednesday, June 8, 2016
Garden 1BC, 16:30-17:00

MS Presentation

Machine Learning for Molecules and Materials, Kristof Schutt (TU Berlin, Germany)

Co-Authors:

High-throughput density functional calculations of solids are highly time-consuming. As an alternative, we propose a machine learning approach for the fast prediction of solid-state properties. To achieve this, local spin-density approximation calculations are used as a training set. We focus on predicting the value of the density of electronic states at the Fermi energy. We find that conventional representations of the input data, such as the Coulomb matrix, are not suitable for the training of learning machines in the case of periodic solids. We propose a novel crystal structure representation for which learning and competitive prediction accuracies become possible within an unrestricted class of spd systems of arbitrary unit-cell size. This is joint work of K. T. Schutt, H. Glawe, F. Brockherde, A. Sanna, K. R. Muller, and E. K. U. Gross
Schwaller Philippe MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:00-15:20

MS Presentation

High-Throughput Prediction of Novel Two-Dimensional Materials, Philippe Schwaller (EPFL, Switzerland)

Co-Authors: Philippe Schwaller (EPFL, Switzerland); Andrea Cepellotti (EPFL, Switzerland); Andrius Merkys (EPFL, Switzerland); Ivano E. Castelli (EPFL, Switzerland); Marco Gibertini (EPFL, Switzerland); Giovanni Pizzi (EPFL, Switzerland); Nicola Marzari (EPFL, Switzerland)

As a crucial step towards the identification of novel and promising 2D materials, we provide here a large scale first-principles exploration and characterization of such compounds. From a combination of 480,000 structures harvested from the ICSD and COD databases, three-dimensional crystals are screened systematically by checking the absence of chemical bonds between adjacent layers, identifying more than 6,000 layered systems. Then DFT calculations of the van der Waals interlayer bonding are performed with automatic workflows, while systematically assessing the metallic, insulating or magnetic character of the materials obtained. Following full atomic and cell relaxations, phonon dispersions are computed as a first step towards the assessment of thermodynamic properties. Thanks to the AiiDA materials' informatics platform [1], and in particular its automatic workflow engine, database structure, sharing capabilities, and pipelines to/from crystallographic repositories, the systematic and reproducible calculation of these properties becomes straightforward, together with seamless accessibility and sharing. [1] http://aiida.net
Schwarz Angelika MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:30-14:45

MS Presentation

Using Generated Matrix Kernels for a High-Order ADER-DG Engine, Angelika Schwarz (Technische Universität München, Germany)

Co-Authors: Vasco Varduhn (Technische Universität München, Germany); Michael Bader (Leibniz Supercomputing Centre, Germany)

The ExaHyPE project employs the high-order discontinuous Galerkin finite element method in order to solve hyperbolic PDEs on adaptive Cartesian grids at exascale level. Envisaged applications include grand-challenge simulations in astrophysics and geosciences. Our compute kernels rely on tensor operations - a type of operation scientific computing libraries only support to a limited degree. We demonstrate concepts of how the tensor operations can be reduced to dense matrix-matrix multiplications, which is undoubtedly one of the best optimised operations in linear algebra. We apply reordering and reshaping techniques, which enables our code generator to exploit existing highly optimised libraries as back end and produce highly optimised compute kernels. As a result, our tool chain provides a "complete solution" for tensor product-based FEM 'operations'.
Schär Christoph MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, Christoph Schär (ETH Zurich, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.
Poster

Poster

CLI-02 Climate Change Simulations at Kilometer-Scale Resolution, Christoph Schär (ETH Zurich, Switzerland)

Co-Authors: Juerg Schmidli (Goethe University of Frankfurt, Germany); Christoph Schär (ETH Zurich, Switzerland)

The recent increase of the computational power enables running of climate simulations at the kilometer-scale resolution. This modeling approach is able to explicitly resolve deep convection (i.e., thunderstorms and rain showers), and thus allows reducing some of the key uncertainties in current climate models. Here we present analyses of decade-long climate change simulations at horizontal resolution of 2.2km across a greater Alpine region. The simulations have been conducted on a Cray XE6 system using a setup with 2000 cores on a computation mesh of 500x500x60 grid points. The results show great improvement in the simulation of summer precipitation, and demonstrate the importance of kilometer-scale resolution in climate change projections.
MS Summary

MS Summary

MS20 Kilometer-Scale Weather and Climate Modeling on Future Supercomputing Platforms, Christoph Schär (ETH Zurich, Switzerland)

Co-Authors: Torsten Hoefler (ETH Zurich, Switzerland)

The development of weather and climate models has made rapid progress in recent years. With the advent of high-performance computing (HPC), the computational resolution will continue to be refined in the next decades. This development offers exciting prospects. From a climate science perspective, a further increase in resolution will make it possible to explicitly represent the dynamics of deep convective and thunderstorm clouds without the help of semi-empirical parameterizations. From a computer science perspective, this strategy poses major challenges. First, emerging hardware architectures increasingly involve the use of heterogeneous many-core architectures consisting of both CPUs and accelerators (e.g., GPUs). The efficient exploitation of such architectures requires a paradigm shift and has only just started. Second, with increasing computational resolution, the models' output becomes unbearably voluminous, the delay from I/O unacceptably large, and long-term storage prohibitively expensive. Ultimately, there is no way around conducting the analyses online rather than storing the model output. This approach implies conducting model reruns (i.e., repeat simulations for refined analysis). These developments pose new challenging computer science questions, which need to be addressed before an efficient exploitation of new hardware systems becomes feasible. The proposed minisymposium is designed as an interdisciplinary workshop between climate and computer scientists, and its overall scope is the further development of high-resolution climate models. Specific aspects to be addressed include the numerical and computational formulation of non-hydrostatic dynamical models on heterogeneous next-generation many-core hardware architectures, the use of novel online analyses methods and model reruns in extended simulations, the virtualization of climate model simulations, as well as the development of bit-reproducible codes across different hardware architectures. Two of the listed presentations (those of Oliver Fuher and David Leutwyler) are centered around a recently developed GPU-enabled version of the COSMO model. This limited-area model is probably the first full atmospheric model that runs entirely on GPUs. It is currently evaluated in a pre-operational test suite for numerical weather prediction, and has already been used for European-scale decade-long climate simulations. Further development and exploitation of the model in a climate setting is currently being undertaken within the project crCLIM (http://www.c2sm.ethz.ch/research/crCLIM). The two organizers of this minisymposium are involved in this project.
Scrofani Roberto MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:45-17:00

MS Presentation

Computational Study of the Risk of Restenosis in Coronary Bypasses, Roberto Scrofani (Ospedale L. Sacco, Italy)

Co-Authors: Christian Vergara (Politecnico di Milano, Italy); Sonia Ippolito (Ospedale Luigi Sacco Milano, Italy); Roberto Scrofani (Ospedale Luigi Sacco Milano, Italy); Alfio Quarteroni (EPFL, Switzerland)

Coronary artery disease, caused by the build-up of atherosclerotic plaques in coronary vessel walls, is one of the leading causes of death in the world. For high-risk patients, coronary artery bypass graft is the preferred treatment. Despite overall excellent patency rates, bypasses may fail due to restenosis. In this context, we present a computational study of the fluid-dynamics in patient-specific geometries with the aim of investigating a possible relationship between coronary stenosis and graft failure. Firstly, we propose a strategy to prescribe realistic boundary conditions in absence of measured data, based on an extension of Murray's law to provide the flow division at bifurcations in case of stenotic vessels and non-Newtonian blood rheology. Then, we show some results regarding numerical simulations in patients treated with grafts, in which the degree of coronary stenosis is virtually varied to compare the fluid-dynamics in terms of hemodynamic indices potentially involved in restenosis development.
Shahmirzadi Omid MS Presentation
Thursday, June 9, 2016
Garden 3A, 15:15-15:30

MS Presentation

Strategies for Efficient Detection of Positive Selection on Phylogenetic Trees, Omid Shahmirzadi (UNIL / SIB, Switzerland)

Co-Authors:

Detection of positive selection on phylogenetic trees, where new advantageous genetic variants dominate in a population, is of a great importance to study gene evolution. However the high computational cost of such a detection in existing implementations limits its applicability. Taking advantage of modern parallel computing systems and clusters as well as different algorithmic techniques, we implemented a new software (known as FastCodeML) that substantially improve the limitations of existing solutions in terms of performance and scalability. We briefly introduce the underlying mechanisms of FastCodeML and discuss its gains over existing implementations.
Shinde Prashant Contributed Talk
Friday, June 10, 2016
Garden 1BC, 10:00-10:15

Contributed Talk

DFT Study of Realistic Zigzag Graphene Nanoribbons, Prashant Shinde (Empa, Switzerland)

Co-Authors:

Graphene nanoribbons with zigzag edges (ZGNRs) show a great potential for use in spintronics devices [Geim et al., 2007]. We report a synthesis of atomically precise ZGNRs through a bottom-up approach on Au(111) surface [Pascal et al., 2016]. Scanning tunneling spectroscopy measurements reveal the existence of edge-localized-states in pristine and phenyl-functionalized ZGNRs. Using density functional theory calculations we show that such functionalization preserves the electronic properties of pristine ZGNR with a reduction in the energy gap which renders edge magnetism unstable to environmental influences. Furthermore, broken translational symmetry along the zigzag edge splits the two edge states into four. Magnetic instability is evident while modelling ZGNRs adsorbed on Au(111). A change in the van der Waals parameterization leads to decrease in the net magnetization and frontier states inversion at the Fermi level. Our findings show that an appropriate precursor molecule provides a route to engineering the electronic structure of ZGNRs.
Shragge Jeffrey MS Presentation
Thursday, June 9, 2016
Garden 1A, 15:00-15:15

MS Presentation

Leveraging the Madagascar Framework for Reproducible Large-scale Cluster and Cloud Computing, Jeffrey Shragge (UWA, Australia)

Co-Authors: Jeffrey Shragge (University of Western Australia, Australia)

Over the past decade the open-source Madagascar framework has been used for reproducible computational seismology research by a growing community of researchers. While Madagascar is commonly used for single-node applications, the increasing number and the computational complexity of user-submitted software tools (e.g., 3D seismic modelling, imaging and inversion codes) are pushing the limits of computational tractability at the workstation level. There is a growing interest and community experience in using Madagascar for cluster-scale public HPC facilities and cloud-based computing environments. In this presentation we highlight our procedure for interfacing Madagascar with publicly accessible HPC clusters, and provide case studies of using Madagascar for large-scale 3D seismic modelling and imaging activities. We present our recent efforts of moving toward a reproducible Madagascar framework within a cloud-computing environment, and provide an example of running 3D acoustic modelling on Australia's NECTAR cloud computing grid using a combination of Python, ZeroMQ, and Cython.
Sidler Dominik Poster

Poster

LS-05 Replica-Exchange Enveloping Distribution Sampling: A Robust and Accurate Method to Calculate Multiple Free Energy Differences from a Single Simulation, Dominik Sidler (ETH Zurich, Switzerland)

Co-Authors: Sereina Z. Riniker (ETH Zurich, Switzerland)

Enveloping distribution sampling (EDS) presents an attractive alternative to standard methods for the calculation of free-energy differences ∆G from molecular dynamics (MD) simulations, as it allows the estimation of ∆G between multiple states from a single simulation of a reference state R. The challenge of the approach is the determination of optimal parameters for R to ensure equal sampling of all end states. While an automatic selection procedure is available for two end states, the determination of optimal R-parameters for multiple end states is currently an unsolved issue. To address this, we have generalized the replica-exchange EDS (RE-EDS) methodology, previously developed for constant-pH MD. By exchanging configurations between replicas with different R-parameters, major parts of the parameter-choice problem can be circumvented, resulting in a reliable, robust and accurate method. We have evaluated RE-EDS using a test system with five inhibitors of phenylethanolamine N-methyltransferase (PNMT) studied previously.
Siemen Stephan MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:30-15:45

MS Presentation

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System, Stephan Siemen (ECMWF, United Kingdom)

Co-Authors: Tiago Quintino (ECMWF, United Kingdom); Baudouin Raoult (ECMWF, United Kingdom); Simon Smart (ECMWF, United Kingdom); Stephan Siemen (ECMWF, United Kingdom); Peter Bauer (ECMWF, United Kingdom)

As the resolution of the forecasts produced by ECMWF's Integrated Forecast System (IFS) is refined, the amount of data involved continues its geometric growth. Current peak loads already require an otherwise oversized parallel storage filesystem (Lustre). The data volume is expected to grow 6-fold by 2020, to reach 120TB/day, concentrated in short 1 hour bursts. Moreover, this data requires post-processing to create the final forecast products sent to end-users, introducing a further I/O bottleneck. Realizing these challenges, ECMWF's Scalability Programme aims to redesign the data workflow to minimize I/O in the time-critical path, whilst retaining resilience to failures. The authors are investigating multiple solutions to tackle issues of data locality, data volume and overall resilience. Solutions range from a novel NVRAM hardware co-design effort inside the EU funded NEXTGenIO project, to the use of distributed object storage technologies and a new dynamic worker-broker solution for managing the post-processing workload.
Simmendinger Christian MS Presentation
Wednesday, June 8, 2016
Garden 2A, 16:00-16:30

MS Presentation

Computational Fluid Dynamics (CFD) with GASPI, Christian Simmendinger (T-Systems SfR, Germany)

Co-Authors:

Computational Fluid Dynamics (CFD) with GASPI: We presents results both from a proxy application for CFD and its corresponding base application. We demonstrate a much improved strong scaling down to 100 mesh points per thread (on 24,000 threads) on the Intel Xeon Phi. The threading model we have used is a modified domain decomposition where reads occur across thread domains, while writes are restricted to the thread domain. We present a relaxed synchronization model for communication with a multithreaded gather and scatter of ghost cell regions. We believe that this Proxy application is representative for a much broader class of applications which make use of unstructured meshes and as such will useful to a wider community.
MS Summary

MS Summary

MS08 Asynchronous Data-Flow Driven Programming with GASPI, Christian Simmendinger (T-Systems SfR, Germany)

Co-Authors:

Asynchronous data-flow driven programming with GASPI Exascale compute architectures will provide us with systems with a very large number of cores. Correspondingly we expect major challenges in the strong-scaling capabilities of the applications. Due to hardware failures and soft errors the number of cores used in a single simulation may vary. Systems are expected to be heterogeneous with respect to compute resources and they are expected to feature a heterogenous memory architecture. Machine jitter will occur at all scales with a corresponding impact on the relative application performance. Higher resolution and multiphysics simulation will require different parallelization strategies. The number of potential sources for load imbalance hence will significantly increase and the means of sharing data (access, communication, and synchronization) will have to be reconsidered on all available parallelization levels. Fortunately, with the advancements in network technology and communication libraries, new opportunities to explore advanced programming models and load-balancing runtime systems in large HPC clusters have emerged. In this minisymposium we will present applications and libraries which are based on the GASPI communication library (a PGAS API) and a task based programming model. The GASPI API increases communication performance by enabling a complete communication/computation overlap and stimulates an asynchronous, task-based programming style for better load balancing. It does this using concepts and implementation techniques that are not (yet) available in other models (e.g. MPI). This minisymposium will present four talks from different application domains, which make use of hybrid task models and the extended feature set of the GASPI API in order to deliver high scalability and a much improved robustness versus jitter. The applications domains are from Machine Learning in Life Sciences, Seismic Imaging, from Computational Fluid Dynamics (CFD) and from a work-stealing application in Combustion CFD. BPMF with GASPI: We present details about a novel implementation of a highly scalable distributed Bayesian Probabilistic Matrix Factorization, with near perfect overlap of communication and computation. BPMF is a large scale machine learning application that is able to predict (e.g., movie ratings). Here we consider the prediction of chemical compound activity on systems with millions of items. Computational Fluid Dynamics (CFD) with GASPI: We present the results from both a proxy application for CFD and its corresponding base application. We demonstrate a much improved strong scaling down to 100 mesh points per thread (on 24,000 threads) on the Intel Xeon Phi. Seismic Imaging with GASPI: Reverse time migration (RTM) is the method of first choice in seismic imaging. The achieved scalability with GASPI is almost perfect over three orders of magnitude up to 1536 nodes (43008 cores) as tested at LRZ SuperMUC. Distributed work-stealing with GASPI: We present a load-balancing library (based on work stealing) for large scale runs where we demonstrate a chemistry computation in combustion CFD simulation.
Smart Simon MS Presentation
Thursday, June 9, 2016
Garden 3B, 15:30-15:45

MS Presentation

Approaches to I/O Scalability Challenges in the ECMWF Forecasting System, Simon Smart (ECMWF, United Kingdom)

Co-Authors: Tiago Quintino (ECMWF, United Kingdom); Baudouin Raoult (ECMWF, United Kingdom); Simon Smart (ECMWF, United Kingdom); Stephan Siemen (ECMWF, United Kingdom); Peter Bauer (ECMWF, United Kingdom)

As the resolution of the forecasts produced by ECMWF's Integrated Forecast System (IFS) is refined, the amount of data involved continues its geometric growth. Current peak loads already require an otherwise oversized parallel storage filesystem (Lustre). The data volume is expected to grow 6-fold by 2020, to reach 120TB/day, concentrated in short 1 hour bursts. Moreover, this data requires post-processing to create the final forecast products sent to end-users, introducing a further I/O bottleneck. Realizing these challenges, ECMWF's Scalability Programme aims to redesign the data workflow to minimize I/O in the time-critical path, whilst retaining resilience to failures. The authors are investigating multiple solutions to tackle issues of data locality, data volume and overall resilience. Solutions range from a novel NVRAM hardware co-design effort inside the EU funded NEXTGenIO project, to the use of distributed object storage technologies and a new dynamic worker-broker solution for managing the post-processing workload.
Smith Barry F. Paper
Wednesday, June 8, 2016
Auditorium C, 15:30-16:00

Paper

Extreme-Scale Multigrid Components within PETSc, Barry F. Smith (Argonne National Laboratory, United States of America)

Co-Authors: Dave A. May (ETH Zurich, Switzerland); Karl Rupp (Austria); Matthew G. Knepley (Rice University, United States of America); Barry F. Smith (Argonne National Laboratory, United States of America)

Elliptic partial differential equations (PDEs) frequently arise in continuum descriptions of physical processes relevant to science and engineering. Multilevel preconditioners represent a family of scalable techniques for solving discrete PDEs of this type and thus are the method of choice for highresolution simulations. The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely affected by using a coarse-level solver with sub-optimal algorithmic complexity. To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary.

In this work, we present a new software component introduced within the Portable Extensible Toolkit for Scientific computation (PETSc) which permits agglomeration. We provide an overview of the design and implementation of this functionality, together with several use cases highlighting the benefits of agglomeration. Lastly, we demonstrate via numerical experiments employing geometric multigrid with structured meshes, the flexibility and performance gains possible using our MPI-rank agglomeration implementation.
Smith Spencer MS Presentation
Wednesday, June 8, 2016
Garden 2BC, 14:00-14:30

MS Presentation

A Literate Process for Improving the Quality of Scientific Computing Software, Spencer Smith (McMaster University, Canada)

Co-Authors:

Scientific Computing (SC) software sometimes suffers with respect to the qualities of reusability, maintainability, verifiability and reproducibility because of a lack of documentation. A potential solution is to generalize the idea of Literate Programming. Our proposed Literate Process will not only generate program documentation and code from the source files, but also other software artifacts, such as the requirements specification, design documentation, and test reports. Documentation quality will improve because a generator removes the drudgery and errors associated with information duplication and traceability. Using Haskell we have developed a prototype tool, named Drasil, to support this process. The fundamental task for Drasil is managing knowledge through what are termed chunks. A recipe is used to put the chunks together and a generator then interprets the recipes to produce the desired documentation. An example will be shown for software that simulates the temperature in a solar water heating tank.
Sojka Radim Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:30-11:50

Contributed Talk

The Energy Consumption Optimization of the FETI Solver, Radim Sojka (IT4Innovations National Supercomputing Center, VSB-Technical University of Ostra, Czech Republic)

Co-Authors: Lubomir Riha (IT4Innovations National Supercomputing Center, Czech Republic); Radim Sojka (IT4Innovations National Supercomputing Center, Czech Republic); Jakub Kruzik (IT4Innovations National Supercomputing Center, Czech Republic); Martin Beseda (IT4Innovations National Supercomputing Center, Czech Republic)

The presentation deals with the energy consumption evaluation of the FETI method blending iterative and direct solvers in the scope of READEX project. The measured characteristics on model cube benchmark illustrate the behaviour of preprocessing and solve phases related mainly to the CPU frequency, different problem decompositions, compiler's type and compiler's parameters. In preprocessing it is necessary to factorize the stiffness and coarse problem matrices, which belongs to the most time and also energy consuming operations. The solve employs the conjugate gradient algorithm and consists of sparse matrix-vector multiplications and vector dot products or AXPY functions. In each iteration we need to apply direct solver twice for pseudo-inverse action and coarse problem solution. All these operations cover together the basic Sparse and Dense BLAS Level 1, 2 and 3 routines, so that we can explore their different dynamism and dynamic switching between various configurations can then provide significant energy savings.
Soleimani Hanieh Poster

Poster

EMD-02 Large Scale Xeon Phi Parallelization of a Deep Learning Language Model, Hanieh Soleimani (Università della Svizzera italiana, Switzerland)

Co-Authors: Tim Dettmers (Università della Svizzera italiana, Switzerland); Olaf Schenk (Università della Svizzera italiana, Switzerland)

Deep learning is a recent predictive modelling approach which yields near-human performance on a range of tasks. Deep learning language models have gained popularity as they achieved state-of-the-art results in many language tasks, such as language translation, but are computationally intensive thus requiring computers with accelerators and weeks of computation time. Here we propose a parallel algorithm for running deep learning language models on hundreds of nodes equipped with Xeon Phis to reduce the computation time to mere hours. We use MPI for the parallelization among nodes and use Xeon Phis to accelerate the matrix multiplications which make up more than 75% of the total computation. With our algorithm experimentation can be done much faster thus enabling rapid progress in the sparsely explored domain of natural language understanding.
Sonnendrücker Eric MS Presentation
Wednesday, June 8, 2016
Garden 3C, 17:00-17:15

MS Presentation

Parallelization Strategies for a Semi-Lagrangian Vlasov Code, Eric Sonnendrücker (MPG, Germany)

Co-Authors: Klaus Reuter (Max Planck Computing and Data Facility, Germany); Eric Sonnendrücker (Max Planck Society, Germany)

Grid-based solvers for the Vlasov equation give accurate results but suffer from the curse of dimensionality. To enable the grid-based solution of the Vlasov equation in 6d phase-space, we need efficient parallelization schemes. In this talk, we consider the 6d Vlasov-Poisson problem discretized by a split-step semi-Lagrangian scheme. This method works with successive 1d interpolations on 1d stripes of the 6d domain. We consider two parallelization strategies: A remapping strategy that works with two different layouts keeping parts of the dimensions sequential and a classical partitioning into hyper-rectangles. The 1d interpolations can be performed sequentially on each processor for the remapping scheme. On the other hand, the remapping consists in an all-to-all communication pattern. The partitioning only requires localized communication but each 1d interpolation needs to be performed on distributed data. We compare both parallelization schemes and discuss how to efficiently handle the domain boundaries in the interpolation for partitioning.
MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:30-16:45

MS Presentation

Particle in Fourier Discretization of Kinetic Equations, Eric Sonnendrücker (MPG, Germany)

Co-Authors: Katharina Kormann (Max Planck Institute for Plasma Physics, Germany); Eric Sonnendrücker (Max Planck Society, Germany)

Particle methods are very popular for the discretization of kinetic equations, since they are embarrassingly parallel. In plasma physics the high dimensionality (6D) of the problems raises the costs of grid based codes, favouring the mesh free transport with particles. A standard Particle in Cell (PIC) scheme couples the particle density to a grid based field solver using finite elements. In this particle mesh coupling the stochastic error appears as noise, while the deterministic error leads to e.g. aliasing, inducing unphysical instabilities. Projecting the particles onto a spectral grid yields an energy and momentum conserving, almost sure aliasing free scheme, Particle in Fourier (PIF). For few electrostatic modes PIF has very little computational overhead, rendering it suitable for a fast implementation. We present 6D Vlasov-Poisson simulations of Landau damping and a Bump-on-Tail instability and compare the results as well as the computational performance to a grid based semi-Lagrangian solver.
Spandan Vamsi MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Vamsi Spandan (PoF, University of Twente, Netherlands)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Sprenger Michael MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, Michael Sprenger (Atmospheric and Climate Science, ETH Zurich, Switzerland, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.
Srolovitz David MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:30-15:00

MS Presentation

Deformation and Failure Behavior of Metallic Nanostructures, David Srolovitz (University of Pennsylvania, United States of America)

Co-Authors: Zhaoxuan Wu (Institute of High Performance Computing, A*STAR, Singapore); Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore); Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore); Wendy Gu (California Institute of Technology, United States of America); Julia Greer (California Institute of Technology, United States of America); David Srolovitz (University of Pennsylvania, United States of America)

The reliability of metallic nanostructures, such as, nanowires and thin films, is often dictated by their mechanical failures. An in-depth understanding in the effects of their intrinsic factors such as grain boundaries and surface roughness and extrinsic factors such as size, shapes and man-made notches on their plastic deformation mechanisms and failure patterns is of great importance in fabricating these nanostructures with high reliability. In this talk, we will first briefly review various deformation mechanisms and failure patterns in literature and highlight some of critical issues that currently are under active research. We will then report our recent progresses in the study of the effects of intrinsic factors, such as grain boundaries and dislocations and extrinsic factors such as sizes, shapes and man-made notches, on the plasticity and failure of metallic nanostructures using both mechanical testing and large-scale molecular dynamics simulations.
Staar Peter W.J. MS Presentation
Thursday, June 9, 2016
Garden 3B, 12:00-12:15

MS Presentation

Accelerated Materials Design: Combining HPC and Cognitive Computing, Peter W.J. Staar (IBM Research, Switzerland)

Co-Authors:

The discovery of new materials has an incredible impact on our daily life. Traditionally, the discovery of these new materials is extremely labor-intensive, requiring material scientists to analyse large numbers of documents and to conduct numerous experiments and simulations. As the number of documents published scales exponentially with time, this approach is not practical. Therefore, the field of material science is ideally suited for a cognitive computing approach. In this talk, we present how a cognitive computing approach can be used to improve our understanding and accelerate the discovery of new materials. We will discuss in detail how cognitive algorithms can analyse large number of documents and catalogue automatically extracted materials and their properties. Furthermore, we will explain how these extracted data can be used in simulations to extend our knowledge. In this way, our cognitive algorithms form an embedding for the traditional material simulations and extend their reach.
Stadel Joachim MS Summary

MS Summary

MS22 N-Body Simulations Techniques, Joachim Stadel (University of Zurich, Switzerland)

Co-Authors: Joachim Stadel (University of Zurich, Switzerland)

Many astrophysical systems are modelled by N-body simulations, where the simulated particles either correspond directly to physical bodies (such as in studies of the Solar system and star cluster) or are representative of a much larger number of collisionlessly moving physical objects (such as in studies of galaxies and large-scale structure). N-body systems are Hamiltonian systems and simulating their long-term evolution is a difficult numerical challenge because of shot noise, errors of the approximated forces, and strong inhomogeneities of the systems in both space and time scales. This latter point is a major difficulty for the development of efficient and scalable algorithms for HPC architectures.
Stadler Georg MS Presentation
Friday, June 10, 2016
Garden 1A, 09:00-09:30

MS Presentation

HPC Challenges Arising in Forward and Inverse Mantle Flow Simulation, Georg Stadler (NYU, United States of America)

Co-Authors: Johann Rudi (University of Texas at Austin, United States of America); Vishagan Ratnaswamy (California Institute of Technology, United States of America); Dunzhu Li (California Institute of Technology, United States of America); Tobin Isaac (University of Chicago, United States of America); Michael Gurnis (California Institute of Technology, United States of America); Omar Ghattas (University of Texas at Austin, United States of America)

We discuss scalable solvers for the forward and inverse simulation of mantle flow problems. Crucial solver components for the arising nonlinear Stokes problems are parallel multigrid methods for preconditioning the linearized Stokes system, and a Schur complement approximation that is able to cope with extreme viscosity variations. To achieve good parallel scalability, we use, among others, matrix-free operations and we redistribute coarse multigrid levels to a subsets of all available processors. We will discuss the inversion of global rheology parameters and distributed fields from surface data and the present-day temperature distribution in instantaneous and time-dependent problems.
Stanisic Luka Poster

Poster

PHY-04 Platform Independent Profiling of a QCD Code, Luka Stanisic (Inria Bordeaux - Sud-Ouest, France)

Co-Authors: Luka Stanisic (Centre de Recherche Inria Bordeaux - Sud-Ouest, France)

The supercomputing platforms available for high performance computing based research evolve at a great rate. However, this rapid development of novel technologies requires adaptations and optimizations of the existing codes for each new machine architecture. In such context, minimizing time of efficiently porting the code on a new platform is of crucial importance. A possible solution is to use coarse grain simulations of the application that can assist in detecting performance bottlenecks. We present a procedure of implementing the intermediate profiling for openQCD code[1] that will enable the global reduction of the cost of profiling and optimizing this code commonly used in the lattice QCD community. Our approach is based on well-known SimGrid simulator[2], which allows for fast and accurate performance predictions of the codes on HPC architectures. Additionally, accurate estimations of the program behavior on some future machines, not yet accessible to us, are anticipated. [1] http://luscher.web.cern.ch/luscher/openQCD/ [2] http://simgrid.gforge.inria.fr/
Stellmach Stephan MS Presentation
Thursday, June 9, 2016
Garden 1A, 10:30-11:00

MS Presentation

Towards a Better Understanding of Rapidly Rotating Convection by Combining Direct Numerical Simulations and Asymptotic Modeling, Stephan Stellmach (Westfälische Wilhelms-Universität Münster, Germany)

Co-Authors: Meredith Plumley (University of Colorado at Boulder, United States of America); Keith Julien (University of Colorado at Boulder, United States of America)

Realistic simulations of planetary dynamos will remain impossible in the near future. Especially the enormous range of spatial and temporal scales induced in convective flows by rotation plagues direct numerical simulations (DNS). The same scale disparities that hamper DNS can however be used to derive reduced equations that are expected to govern convection in the limit of rapid rotation. Simulations based on such formulations represent an interesting alternative to DNS. In this talk, recent efforts to test asymptotic models against DNS are reviewed. Results in plane layer geometry reveal convergence of both approaches. Surprisingly, Ekman layers have a profound effect in the rapidly rotating regime and explicitly have to be accounted for in the asymptotic models. Upscale kinetic energy transport leads to the formation of large-scale structures, which may play a prominent role in dynamos. The asymptotic models allow an exploration of parameter regimes far beyond the capabilities of DNS.
Stevens Richard J.A.M. MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Richard J.A.M. Stevens (PoF, University of Twente, Netherlands)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Stivala Alex Poster

Poster

EMD-03 Parallel MCMC for Estimating Exponential Random Graph Models, Alex Stivala (Melbourne School of Psychological Sciences, The University of Melbourne, Australia)

Co-Authors: Alex Stivala (University of Melbourne, Australia); Antonietta Mira (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Garry Robins (University of Melbourne, Australia); Alessandro Lomi (Università della Svizzera italiana, Switzerland)

As information and communication technologies continue to expand, the need arises to develop analytical strategies capable of accommodating new and larger sets of social network data. Considerable attention has recently been dedicated to the possibility of scaling exponential random graph models (ERGMs) - a well-established family of statistical models - for analyzing large social networks. Efficient computational methods would be highly desirable in order to extend the empirical scope of ERGM for the analysis of large social networks. We report preliminary results of a research project on the development of new sampling methods for ERGMs. We propose a new MCMC sampler and use it with Metropolis coupled Markov chain Monte Carlo, a typical scheme for MCMC parallelization. We show that, using this method, the CPU time for parameter estimation may be considerably reduced. *Generous support from the Swiss National Platform of Advanced Scientific Computing (PASC) is gratefully acknowledged.
Sudret Bruno MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:30-15:45

MS Presentation

Uncertainty Quantification and Global Sensitivity Analysis for Economic Models, Bruno Sudret (ETH Zurich, Switzerland)

Co-Authors: Viktor Winschel (ETH Zurich, Switzerland); Stefano Marelli (ETH Zurich, Switzerland); Bruno Sudret (ETH Zurich, Switzerland)

We present a method for global sensitivity analysis of the outcomes of an economic model with respect to their parameters. Traditional sensitivity analyses, like comparative statics, scenario and robustness analysis are local and depend on the chosen combination of parameter values. Our global approach specifies a distribution for each parameter and approximates the outcomes as a polynomial of parameters. In contrast to local analyses, the global sensitivity analysis takes into account non-linearities and interactions. Using the polynomial, we compute the distribution of outcomes and a variance decomposition called Sobol' indices. We obtain an importance ranking of the parameters and their interactions, which can guide calibration exercises and model development. We compare the local to the global approach for the mean and variance of production in a canonical real business cycle model. We find an interesting separation result: for mean production, only capital share, leisure substitution rate, and depreciation rate matter.
Surville Clement MS Presentation
Thursday, June 9, 2016
Garden 3C, 14:20-14:50

MS Presentation

A New Reverse Tree-Method for Self Gravity Calculation Applied to Fixed Grid FV Methods, Clement Surville (Institute for Computational Science, UZH, Switzerland)

Co-Authors:

We present a new hybrid approach bringing the Tree algorithm for gravity calculation to the frame of fixed grid finite volume methods. The goal is to obtain a N log(N) complexity on cylindrical or spherical coordinate systems. The main property of our method is to build an approximate tree of the gravity field rather than a tree of the mass distribution. The algorithm has also the property to produce the exact gravity at some position of space (largest nodes of the tree). Finally, we show how fast and accurate the method could be on modern supercomputers.
Szostek Pawel Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:45-15:00

Contributed Talk

A Thread-Parallel Implementation of High-Energy Physics Particle Tracking on Many-Core Hardware Platforms, Pawel Szostek (CERN, Switzerland)

Co-Authors: Pawel Szostek (CERN, Switzerland)

Tracking and identification of particles are amongst the most time-critical tasks in the computing farms of high-energy physics experiments. Many of the underlying algorithms were designed before the advent of multi-core processors and are therefore strictly serial. Since the introduction of multi-core platforms these workloads have been parallelized by executing multiple copies of the application. This approach does not optimally utilize modern hardware. We present two different thread-parallel implementations of a straight-line particle track finding algorithm. We compare our implementations, based on TBB and OpenMP, to the current production version in LHCb's Gaudi software framework as well as a state-of-the-art implementation for general purpose GPUs. Our initial analysis shows a speedup of more than 50% over multi-process Gaudi runs and competitive performance to the GPGPU version. This study allows us to better understand the impact of many-core hardware platforms supplementing traditional CPUs in the context of the upcoming LHCb upgrade.

T

Tabuchi Akihiro MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:00-14:15

MS Presentation

Omni Compiler and XcodeML: An Infrastructure for Source-to-Source Transformation, Akihiro Tabuchi (University of Tsukuba, Japan)

Co-Authors: Hitoshi Murai (AICS, RIKEN, Japan); Masahiro Nakao (AICS, RIKEN, Japan); Hidetoshi Iwashita (AICS, RIKEN, Japan); Jinpil Lee (AICS, RIKEN, Japan); Akihiro Tabuchi (University of Tsukuba, Japan)

We have been developing a compiler of PGAS programming language called XcalableMP for post-petascale computing. XcalableMP is a directive-based language extension of Fortran95 and C for scientific programming for high-performance distributed memory parallel systems. Omni Compiler is an infrastructure for source-to-source transformation to design source-to-source compilers such as Omni XcalableMP compiler. It includes C and Fortran95 front-ends which translate a source code to XML-based intermediate code called XcodeML, a Java-based code-transformation library on XcodeML, and the de-compilers which translate XcodeML intermediate code back to transformed source code. Currently, the Omni compiler also supports the code transformation for OpenMP and OpenACC. In this talk, we will present internals of Omni compiler by taking Omni XcalableMP compiler as a case study, and our future plan.
Terai Masaaki Paper
Wednesday, June 8, 2016
Auditorium C, 14:00-14:30

Paper

Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5, Masaaki Terai (RIKEN AICS, Japan)

Co-Authors: Masaaki Terai (RIKEN / Advanced Institute for Computational Science, Japan); Ryuji Yoshida (RIKEN / Advanced Institute for Computational Science, Japan); Shin-ichi Iga (RIKEN / Advanced Institute for Computational Science, Japan); Kazuo Minami (RIKEN / Advanced Institute for Computational Science, Japan); Hirofumi Tomita (RIKEN / Advanced Institute for Computational Science, Japan)

We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
Teyssier Romain Poster

Poster

CSM-09 Hash Tables on GPUs Using Lock-Free Linked Lists, Romain Teyssier (University of Zurich, Switzerland)

Co-Authors: Andreas Bleuler (University of Zurich, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Romain Teyssier (University of Zurich, Switzerland)

Hash table implementations which resolve collisions by chaining with linked lists are very flexible with respect to the insertion of additional keys to an existing table and to the deletion of a part of the keys from it. For our implementation on GPUs, we use non-blocking linked lists based on atomic "compare and swap" operations. The deletion of list entries is done by declaring them as invalid and removing them. Typically, after a couple of deletion operations, our local heap is compacted. Using this approach, the initial build of the hash table and hash lookups perform comparably to the CUDPP library implementation. However, small modifications of the table are performed much faster in our implementation than the complete rebuild required by other implementations. We intend to use this novel hash table implementation for astrophysical GPU simulations with adaptive mesh particle-in-cell, which would benefit greatly from these new features.
Tiana Davide MS Presentation
Friday, June 10, 2016
Garden 1BC, 10:30-10:45

MS Presentation

Understanding the Magnetic and Conductive Properties of Hybrid Materials by High-Throughput Screening, Davide Tiana (EPFL, Switzerland)

Co-Authors:

The demand of more energy combined with the requirement of reducing greenhouse emissions has made the developing of new electronic devices challenging. Improvements in the efficiency of applications like solar-cells, light emitting devices or semi-conductors are then required. In this context the possibility of tuning their electronic properties make Metal Organic Frameworks (MOFs), which are crystalline materials that combine inorganic metals with organic ligands, interesting candidates for the next generations of electronic-devices. At present the main limitation in electro-active MOFs is that the valence and conduction bands are localised either on the inorganic or on the organic part restricting their potential usage for magnetism and/or conductivity in which electron mobility is required. In this work I will show how through a combinatorial analysis the delocalisation between the organic and inorganic part has been rationalised allowing the design of hybrid materials with high magnetic and conductive properties.
Tkatchenko Alexandre MS Summary

MS Summary

MS10 From Materials' Data to Materials' Insight by Machine Learning, Alexandre Tkatchenko (Fritz Haber Institute Berlin, Germany)

Co-Authors: Alexandre Tkatchenko (Fritz Haber Institute Berlin, Germany), James Kermode (Warwick, United Kingdom)

The rise of high-throughput computational materials design promises to revolutionize the process of discovery of new materials, and tailoring of their properties. At the same time, by generating the structures of hundreds of thousands of hypothetical compounds, the issue of automated processing of large amounts of materials' data has been made very urgent - to identify structure-property relations, rationalize intuitively the behaviour of materials of increasing complexity, and re-use existing information to accelerate the prediction of properties and accelerate the search of materials' space. To address this challenge, a strongly interdisciplinary effort has developed, uniting forces among researchers in applied mathematics, computer science, chemistry and materials science, that aims at adapting machine-learning techniques to the specific problems that are encountered when working with materials. This minisymposium will showcase the most recent developments in this field, and provide a forum for some of the leading figures to discuss the most pressing challenges and the most promising directions. The participants will be selected to represent the many disciplines that are contributing to this endeavour and will cover the following topics: the representation of materials' structures and properties in a synthetic form that is best suited for automated processing, learning of the structure-property relations and circumventing the large computational cost of high-end electronic structure calculations, the identification of outliers and the automatic assessment of the reliability of input data, demonstrative applications to important materials science problems.
Tomita Hirofumi Paper
Wednesday, June 8, 2016
Auditorium C, 14:00-14:30

Paper

Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5, Hirofumi Tomita (RIKEN AICS, Japan)

Co-Authors: Masaaki Terai (RIKEN / Advanced Institute for Computational Science, Japan); Ryuji Yoshida (RIKEN / Advanced Institute for Computational Science, Japan); Shin-ichi Iga (RIKEN / Advanced Institute for Computational Science, Japan); Kazuo Minami (RIKEN / Advanced Institute for Computational Science, Japan); Hirofumi Tomita (RIKEN / Advanced Institute for Computational Science, Japan)

We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
Toro Eleuterio F. MS Presentation
Wednesday, June 8, 2016
Garden 3A, 13:00-13:30

MS Presentation

Junction-Generalized Riemann Problem for Stiff Hyperbolic Balance Laws in Networks of Blood Vessels, Eleuterio F. Toro (University of Trento, Italy)

Co-Authors: Eleuterio F. Toro (University of Trento, Italy); Gino I. Montecinos (Universidad de Chile, Chile); Raul Borsche (Technische Universität Kaiserslautern, Germany); Jochen Kall (Technische Universität Kaiserslautern, Germany)

We design a new implicit solver for the Junction-Generalized Riemann Problem (J-GRP), which is based on a recently proposed implicit method for solving the Generalized Riemann Problem (GRP) for systems of hyperbolic balance laws. We use the new J-GRP solver to construct an ADER scheme that is globally explicit, locally implicit and with no theoretical accuracy barrier, in both space and time. The resulting ADER scheme is able to deal with stiff source terms and can be applied to non-linear systems of hyperbolic balance laws in domains consisting on networks of one-dimensional sub-domains. Here we specifically apply the numerical techniques to networks of blood vessels. An application to a physical test problem consisting of a network of 37 compliant silicon tubes (arteries) and 21 junctions, reveals that it is imperative to use high-order methods at junctions, in order to preserve the desired high-order of accuracy in the full computational domain.
Torrent Marc MS Summary

MS Summary

MS04 First-Principles Simulations on Modern and Novel Architectures, Marc Torrent (CEA Bruyères-le-Châtel, France)

Co-Authors: Matteo Giantomassi (Université catholique de Louvain, Belgium)

The predictive power of the so-called ab initio methods, based on the fundamental quantum-mechanical models of matter at the atomic level, together with the growing computational power of high-end High Performance Computing (HPC) systems, have led to exciting scientific and technological results in Materials Science. The increase of computational power coupled with better numerical techniques open up the possibility to simulate and predict the behaviour of larger and larger atomic systems with a higher degree of accuracy, shortening the path from theoretical results to technological applications, and opening up the possibility to design new materials from scratch. Despite the elegant simplicity of the formulation of the basic quantum mechanical principles, a practical implementation of a many-particle simulation has to use some approximations and models to be feasible. As there are several options for these approximations, different ab initio simulation codes have been developed, with different trade-offs between precision and computational effort. Each of these codes has its specific strengths and weaknesses, but all together have contributed to making computational materials science one of the domains where supercomputers raise the efficiency of producing scientific know-how and technological innovation. Indeed, a large fraction of the available workload in supercomputers around the world is spent to perform Computational Materials Science simulations. These codes have mostly kept pace with hardware improvements over the years, by relying on proven libraries and paradigms, such as MPI, that could abstract the developers from low-level considerations while the architectures evolved within a nearly homogeneous model. In the past few years, however, the emergence of heterogeneous computing elements associated with the transition from peta- to exascale has started to evidence the fragility of this model of development. The aim of the present minisymposium is to gather expert developers from different codes to discuss the challenges of porting, scaling, and optimizing material science application codes for modern and novel platforms. The presentations will focus on advanced programming paradigms, novel algorithms, domain specific libraries, in-memory data management, and software/hardware co-design. Exascale related challenges (such as sustained performance, energy awareness, code fault tolerance, task concurrency and load balancing, numerical noise and stability, big data I/O) will also be discussed.
Toth Gabor MS Presentation
Wednesday, June 8, 2016
Garden 3C, 16:00-16:30

MS Presentation

Decoupling and Coupling in iPIC3D, a Particle-in-Cell Code for Exascale, Gabor Toth (University of Michigan, United States of America)

Co-Authors: Stefano Markidis (Royal Institute of Technology, Sweden); Erwin Laure (Royal Institute of Technology, Sweden); Yuxi Chen (University of Michigan, United States of America); Gabor Toth (University of Michigan, United States of America); Tamas Gombosi (University of Michigan, United States of America)

iPIC3D is a massively parallel three-dimensional implicit particle-in-cell code used for the study of the interactions between the solar wind and Earth's magnetosphere. Because the scalability potential and scientific impact demonstrated by iPIC3D, it has been selected in many European HPC projects to prepare for the future exascale machines. In this talk, we present new algorithmic changes to iPIC3D in preparation for the coming exascale era. In particular, we will present decoupled computation, communication and I/O operations in iPIC3D to address the challenges of irregular operations on large number of processes. Our evaluation results show that the performance benefits from this model increases as the scale of simulation increases. We also present a two-way coupled kinetic-fluid model with multiple implicit PIC domains (by the iPIC3D code) embedded in MHD (by the BATS-R-US code) under the Space Weather Modeling Framework (SWMF).
Trampert Jeannot Poster

Poster

EAR-03 Imaging Subsurface Fluid via Poroelastic Theory and Adjoint Tomography, Jeannot Trampert (Utrecht University, Netherlands)

Co-Authors: Jeannot Trampert (Utrecht University, Netherlands)

Poroelastic theory is essential in many geophysical applications such as imaging fluid-flow in oil reservoirs, monitoring the storage of CO2, and most topics in hydrogeology etc. Biot formulated the poroelastic wave equation in fully-saturated media. Based on Biot's theory, we aim to image the fluid directly in a regional seismic exploration setting. We simulate wave propagation in poroelastic media using a spectral element method. We define several misfit functionals and use adjoint methods to calculate the corresponding sensitivity kernels. The adjoint method is an efficient way for computing the gradient of a misfit functional with respect to the model parameters, and is based on the interaction between the time-reversed regular and adjoint field. Using those kernels, we perform gradient-based iterative inversions. We investigate in how far a poroelastic theory is effective for imaging the fluids, and study the influence of bulk properties, porosity, permeability and fluid-solid interaction on the results.
Tran Trach-Minh Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Trach-Minh Tran (Swiss Plasma Center, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Trach-Minh Tran (Swiss Plasma Center, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.
Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Trach-Minh Tran (Swiss Plasma Center, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Trémolet Yannick MS Presentation
Wednesday, June 8, 2016
Garden 3B, 15:30-16:00

MS Presentation

Improving the Scalability of 4D-Var with a Weak Constraint Formulation, Yannick Trémolet (ECMWF, United Kingdom)

Co-Authors: Mike Fisher (ECMWF, United Kingdom)

In 4D-Var, the forecast model is used to propagate the initial state to the time of the observations and is assumed to be perfect. As most aspects of the data assimilation system have improved over the years, this assumption becomes less realistic. There are theoretical benefits in using a weak-constraint formulation and a long assimilation window in 4D-Var and recent experiments have shown benefits in using overlapping assimilation windows even with strong constraint 4D-Var. The weak constraint formulation writes the optimisation problem as a function of the four dimensional state over the length of the assimilation window. In addition to its theoretical advantages, it increases the potential for parallelisation and better scalability. Using a saddle point method make it possible to take full advantage of the potential for additional parallelism. We will show how it can benefit future operational systems and reduce the time to solution in the critical path.
MS Summary

MS Summary

MS09 Efficient Data Assimilation for Weather Forecasting on Future Supercomputer Architectures, Yannick Trémolet (ECMWF, United Kingdom)

Co-Authors:

Data assimilation is the process by which the initial condition for weather forecasts is determined. It combines information from a previous forecast (background) and recent observations of the Earth system, together with estimates of their respective uncertainties, to produce the best estimate of the current state of the system to be used as the initial condition for the forecast. Uncertainty around that best estimate are now also being produced. Weather forecasting models are well known for having always been at the forefront of high-performance computing (HPC). With more and more accurate data assimilation algorithms and more and more observations becoming available, the computational cost of data assimilation has become as high as that of running the forecast. However, this aspect has not been given as much attention in the HPC community. As for other applications, part of the challenge lie in the efficient use of increasingly complex supercomputer architectures. Some challenges such as increasing resolution and increasing I/O volumes are common with the forecasting problem. The fact that forecasting increasingly relies on coupled atmosphere-ocean-waves-land surface models has only just started to be really accounted for in data assimilation. It opens new perspectives as observations in one part of the system could help improve the estimation of the state in another one. However, that will increase its overall cost and complexity. Data assimilation also poses its own specific challenges, due for example to the volume and very heterogeneous distribution of observations over the globe and their very heterogeneous nature. The minisymposium aims at bringing together experts in data assimilation to expose the challenges posed by data assimilation and some of the current directions of research for addressing those, with a focus on the scalability and efficiency aspects. Among others, methods such as weak constraint 4D-Var or ensemble variational methods will be presented as they offer more parallelism in the time dimension or in exploring several directions in the space of solutions in parallel. More efficient methods for modelling background errors statistics will also be discussed.
Tromp Jeroen MS Presentation
Thursday, June 9, 2016
Garden 1A, 15:15-15:30

MS Presentation

Towards Exascale Seismic Imaging & Inversion, Jeroen Tromp (Princeton University, United States of America)

Co-Authors:

Post-petascale supercomputers are now available to solve scientific problems that were thought unreachable a decade ago. They also bring a cohort of concerns tied to obtaining optimum performance. These include energy consumption, fault resilience, scalability of current parallel paradigms, workflow management, I/O performance and feature extraction with large datasets. We focus on the last three issues. The overarching goal is to reduce the time to solution. Experiments show that the imaging workflow suffers from severe I/O bottlenecks. Such limitations occur both for computational data and seismic time series. The latter are dealt with by the introduction of a new Adaptable Seismic Data Format (ASDF). HDF5 and ADIOS are used to reduce the cost of disk access. Because large parts of the workflow are embarrassingly parallel, we are investigating the possibility of automating the process with the integration of scientific workflow management tools, specifically Pegasus.

V

van Albada Sacha J. MS Presentation
Friday, June 10, 2016
Garden 2BC, 09:20-09:40

MS Presentation

Multi-Scale Modeling of Cortex at Cellular Resolution, Sacha J. van Albada (Forschungszentrum Jülich GmbH, Germany)

Co-Authors:

Cortical modelling has many unknowns, but the numbers of neurons and synapses can be accurately estimated. Downscaling, commonly used for feasibility, distorts network dynamics. Fortunately, full-scale modelling is increasingly possible. We develop a full-density multi-area model of macaque vision-related cortex at cellular resolution. A layered microcircuit model with spiking point neurons, customized with area-specific neuron densities, represents each of 32 areas. The inter-area connectivity combines the CoCoMac database, quantitative tracing data, and statistical regularities. The connectivity is refined using mean-field theory. Indegrees increase from V1 to higher areas, which therefore exhibit bursty firing and long intrinsic time scales. Inter-area propagation occurs predominantly in the feedback direction, as in visual imagery. At intermediate inter-area connection strengths, functional connectivity between areas corresponds well with macaque resting-state fMRI. The model achieves consistency of structure and activity at multiple scales and provides a platform for future research.
van Dinther Ylona MS Summary

MS Summary

MS26 Bridging Scales in Geosciences, Ylona van Dinther (ETH Zurich, Switzerland)

Co-Authors: Dave A. May (ETH Zurich, Switzerland), Michael Bader (Leibniz Supercomputing Centre, Technische Universitaet Muenchen, Germany)

Complex, but relevant processes within the Solid Earth domain cover a wide range of space and time scales, up to 17 and 26 orders of magnitude, respectively. Earthquake propagation, for instance, depends on dynamic processes at the rupture tip over 10^-9 seconds, while the plate tectonic faults on which they occur evolve over time scales up to 100's of millions of years. While problems in imaging and modelling of mantle processes on the Earth's tens of thousands of kilometers scale can be affected by physio-chemical compositions varying on a meter scale and being determined on a molecular level. Besides these examples ample of other physical processes in geophysics cross the largest imaginable scales. At each of the characteristic scales different physical processes are relevant, which thus requires us to couple the relevant physics at the different scales. Simulating the physics at each of these scales is a tremendous task, which hence often requires High Performance Computing. Computational challenges include, but are not limited to, a large number of degrees of freedom and crossing the two-scale problem on which most computational tools are founded. To discuss and start to tackle these challenges we aim to bring together computer and geoscientists to discuss them from different perspectives. Applications within geosciences, include, but or not limited to, geodynamics, seismology, fluid dynamics, tectonics, geomagnetism, and exploration geophysics.
van Zelst Iris Poster

Poster

EAR-01 Coupling Geodynamic Seismic Cycle and Dynamic Rupture Models, Iris van Zelst (Institute of Geophysics, ETH Zürich, Switzerland)

Co-Authors: Ylona van Dinther (ETH Zurich, Switzerland); Alice-Agnes Gabriel (Ludwig Maximilian University of Munich, Germany)

Diverse modelling techniques that span large spatial and temporal scales are required to study the seismicity in subduction zones. Our seismo-thermo-mechanical (STM) seismic cycle models solve million year scale subduction dynamics and multiple earthquake events self-consistently, but fail to resolve the finer seismic time scale at which dynamic rupture models excel. By using the self-consistent stresses and strengths of our STM model as input for dynamic rupture scenarios conducted with SeisSol, the otherwise hard-to-constrain assumptions on these fields are resolved and advantages of both methods are exploited. The results show that a dynamic rupture can be triggered spontaneously and that the propagating rupture is qualitatively comparable to its quasi-static equivalent. The importance of both self-consistent initial conditions and dynamic feedback on fault strength is illustrated by a quantitative comparison of surface displacements and stresses.
Vander Aa Tom MS Presentation
Wednesday, June 8, 2016
Garden 2A, 16:30-17:00

MS Presentation

Distributed Matrix Factorization in C++ Using TBB, OpenMP, GASPI, Tom Vander Aa (imec, Belgium)

Co-Authors:

BPMF is a matrix factorisation technique used in recommender systems such as the ones from Netflix and Amazon. In this talk we present how to make a parallel high-performance BPMF implementation using node-level shared memory parallelism with TBB and OpenMP and distributed parallelism using GASPI and MPI. We show trade-offs and performance implications of using the different technologies and explain what works and what does not.
VandeVondele Joost Contributed Talk
Friday, June 10, 2016
Garden 1BC, 09:15-09:30

Contributed Talk

Ab-Initio Quantum Transport Simulation of Nano-Devices, Joost VandeVondele (ETH Zurich, Switzerland)

Co-Authors: Mauro Calderara (ETH Zurich, Switzerland); Mohammad Hossein Bani-Hashemian (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland)

To simulate advanced electronic devices such as nanoscale transistors or memory cells whose functionality may depend on the position of single atoms only, a quantum transport solver is needed, which should not only be capable of atomic scale resolution, but also to deal with systems consisting of thousands to a hundred thousands atoms. The device simulator OMEN and the electronic structure code CP2K have been united to perform ab initio quantum transport calculations on the level of density functional theory. To take full advantage of modern hybrid supercomputer architectures, new algorithms have been developed and implemented. They allow for the simultaneous computation of open boundary conditions in parallel on the available CPUs and the solution of the Schrödinger equation in a scalable way on the GPUs. The main concepts behind the algorithms will be presented and results for realistic nanostructures will be shown.
Contributed Talk
Thursday, June 9, 2016
Garden 1BC, 11:10-11:30

Contributed Talk

Performance Improvement by Exploiting Sparsity for MPI Communication in Sparse Matrix-Matrix Multiplication, Joost VandeVondele (ETH Zurich, Switzerland)

Co-Authors: Joost VandeVondele (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland)

DBCSR is the sparse matrix library at the heart of the CP2K linear scaling electronic structure theory algorithm. It is MPI and OpenMP parallel, and can exploit accelerators. The multiplication algorithm is based on Cannon's algorithm, whose scalability is limited by the MPI communication time. The implementation is based on MPI point-to-point communications. We present an improved implementation that takes in account the sparsity of the problem in order to reduce the communication. This implementation makes use of one-sided communications. Performance results for representative CP2K benchmarks will be also presented.
Poster

Poster

MAT-09 Sparse Matrix Multiplication Library for Linear Scaling DFT Calculations in Electronic Structure Codes, Joost VandeVondele (ETH Zurich, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Andreas Glöss (University of Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

The key operation for linear scaling DFT implemented in the CP2K quantum chemistry program is sparse matrix-matrix multiplication. For such a task, the sparse matrix library DBCSR (Distributed Block Compressed Sparse Row) has been developed. DBCSR takes full advantage of the block-structured sparse nature of the matrices for efficient computation and communication. It is MPI and OpenMP parallelized, and can exploit accelerators. We describe a strategy to improve DBCSR performance. DBCSR is available as a stand alone library at http://dbcsr.cp2k.org/ to be employed in electronic structure codes. To this end a streamlined API has been defined and a suite of tools has been developed to generate the full documentation of the library (API-DOC) by extracting the information provided directly in the source code. We give a flavour of the generated API-DOC by showing snapshots of selected HTML documentation pages and we sketch the design of such tools.
Poster

Poster

MAT-06 Linear Scaling Ehrenfest Molecular Dynamics, Joost VandeVondele (ETH Zurich, Switzerland)

Co-Authors: Florian Schiffmann (Victoria University, Australia); Joost VandeVondele (ETH Zurich, Switzerland)

With the available computational power growing, ever larger systems can be investigated with increasingly advanced methods and new algorithms. For electronic structure calculations on systems containing a few thousand atoms, linear scaling algorithms are essential. For ground state DFT calculations, linear scaling has already been demonstrated for millions of atoms in the condensed phase [J. VandeVondele, U. Bortnik, J. Hutter, 2012]. Here, we extend this work to electronically excited states, for example, to make UV/VIS spectroscopy or investigations of the electron injection process in dye-sensitized solar cells possible. We base our approach on non-adiabatic molecular dynamics, in particular on Ehrenfest molecular dynamics (EMD). The formalism, based on the density matrix, allows for linear scaling based on the sparsity of the density matrix and naturally incorporates density embedding methods such as the Kim-Gordon approach.
Poster

Poster

MAT-01 A Generalized Poisson Solver for First-Principles Device Simulations, Joost VandeVondele (ETH Zurich, Switzerland)

Co-Authors: Sascha Brück (ETH Zurich, Switzerland); Mathieu Luisier (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland)

We present a Poisson solver with main applications in ab-initio simulations of nanoelectronic devices. The solver employs a plane-wave (Fourier) based pseudospectral approach and is capable of solving the generalized Poisson equation with a position-dependent dielectric constant subject to periodic or homogeneous Neumann conditions on the boundaries of the simulation cell and Dirichlet type conditions imposed at arbitrary subdomains. Any sufficiently smooth function modelling the dielectric constant, including density dependent dielectric continuum models can be utilized. Furthermore, for all the boundary conditions, consistent derivatives are available allowing for energy conserving molecular dynamics simulations.
Poster

Poster

MAT-04 CP2K within the PASC Materials Network, Joost VandeVondele (ETH Zurich, Switzerland)

Co-Authors: Alfio Lazzaro (ETH Zurich, Switzerland); Hans Pabst (Intel Semiconductor AG, Switzerland); Ole Schuett (ETH Zurich, Switzerland); Joost VandeVondele (ETH Zurich, Switzerland); Juerg Hutter (University of Zurich, Switzerland)

One of the goals of the PASC project is to strengthen the networking in the Swiss material science community through active development of collaborative relationships among University researchers and CSCS staff. This includes assisting researchers in tuning, debugging, optimizing, and enhancing codes and applications for HPC resources, from mid-scale to national and international petascale facilities, with a view to the exascale transition. In addition, the application support specialists provide support for development projects on software porting techniques, parallelization and optimization strategies, deployment on diverse computational platforms, and data management. Here we present selected tools and software developed for CP2K [1]. Furthermore we show exemplary how a CP2K application can be tuned to optimally use all available HPC resources. With a view to the next-generation HPC hardware, we present first promising performance results for INTEL's Broadwell-EP and KNL platform. [1] The CP2K developers group, CP2K is freely available from: https://www.cp2k.org/, 2016
Vanherpe Liesbeth MS Presentation
Friday, June 10, 2016
Garden 2BC, 10:40-11:00

MS Presentation

In Silico Synthesis of Spatially-Embedded Neuronal Morphologies, Liesbeth Vanherpe (EPFL, Switzerland)

Co-Authors:

Brain functionality depends critically on the connectivity of neurons, which in turn depends on their morphologies. Therefore, to reproduce brain function through simulation of large-scale neuronal networks, it is important to use realistic neuronal morphologies. The classic approach where morphologies are reconstructed from experiments provides invaluable information, but is time consuming and does not scale to the number of cells needed for whole brain simulations. In order to increase the morphological variability available for such simulations, we are developing a framework for synthesizing neurons. We propose to create morphologies based on spatially embedded stochastic models: we combine biological morphometrics obtained from reconstructions with a virtual representation of the brain environment, which confines and guides neurites. We discuss the data structures and algorithms used to provide environmental information to the growing neurons, and demonstrate the current status of the synthesis framework.
Poster

Poster

LS-04 How to Synthesize Neurons? A Case Study on Best Practices in Scientific Software Development, Liesbeth Vanherpe (EPFL, Switzerland)

Co-Authors: Juan Palacios (EPFL, Switzerland)

The Blue Brain Project (BBP) and the Human Brain Project aim to improve our understanding of the human brain through data-driven modelling and whole brain simulation. Advanced computing technologies enable such projects to study problems that were unmanageable until recently. Care must be taken in the development process to ensure that the software developed is usable and maintainable over the project lifetime. Best practices range from documentation, code review, continuous integration, test coverage, to functional and integration testing. While common in industrial environments, these best practices are not yet common in more academic scientific projects. Within the BBP, we are developing scientific software to synthesize biologically realistic neuronal morphologies for large-scale brain simulations. We discuss how best practices are applied in different stages of our scientific software development process. We show how validation drives the development of increasingly complex models and demonstrate how modular design benefits future use through examples.
vanKeulen Siri Camee MS Presentation
Friday, June 10, 2016
Garden 3A, 09:00-09:30

MS Presentation

Effect of Lipidation for G Protein Mediated Signalling, Siri Camee vanKeulen (EPFL, Switzerland)

Co-Authors: Siri Camee vanKeulen (EPFL, Switzerland)

G-protein-coupled-receptor (GPCR) pathways are of high interest since their signal-transduction cascades play an important role in several diseases such as hypertension and obesity. The first proteins that will transfer an extracellular signal received by a GPCR to other regions in the cell are G protein heterotrimers. The specificity by which G protein subunits interact with receptors and effectors defines the variety of responses that a cell is capable of providing due to an extracellular signal. Interestingly, many G proteins have distinct lipidation profiles but little is known how this influences their function. Here, we investigate the effect of myristoylation on the structure and dynamics of Gαi1 and the possible implications for signal transduction. A 2 µs molecular dynamics simulation suggests conformational changes of the switch II and alpha helical domains emphasizing the importance of permanent lipid attachment in tuning the function of signaling proteins.
Varduhn Vasco MS Presentation
Wednesday, June 8, 2016
Garden 3B, 14:30-14:45

MS Presentation

Using Generated Matrix Kernels for a High-Order ADER-DG Engine, Vasco Varduhn (Technische Universität München, Germany)

Co-Authors: Vasco Varduhn (Technische Universität München, Germany); Michael Bader (Leibniz Supercomputing Centre, Germany)

The ExaHyPE project employs the high-order discontinuous Galerkin finite element method in order to solve hyperbolic PDEs on adaptive Cartesian grids at exascale level. Envisaged applications include grand-challenge simulations in astrophysics and geosciences. Our compute kernels rely on tensor operations - a type of operation scientific computing libraries only support to a limited degree. We demonstrate concepts of how the tensor operations can be reduced to dense matrix-matrix multiplications, which is undoubtedly one of the best optimised operations in linear algebra. We apply reordering and reshaping techniques, which enables our code generator to exploit existing highly optimised libraries as back end and produce highly optimised compute kernels. As a result, our tool chain provides a "complete solution" for tensor product-based FEM 'operations'.
Vasmel Marlies MS Presentation
Friday, June 10, 2016
Garden 1A, 10:30-10:45

MS Presentation

Dynamically Linking Seismic Wave Propagation at Different Scales, Marlies Vasmel (Institute of Geophysics, ETH Zurich, Switzerland)

Co-Authors: Marlies Vasmel (ETH Zurich, Switzerland); Dirk-Jan van Manen (ETH Zurich, Switzerland); Johan Robertsson (ETH Zurich, Switzerland)

Numerical modelling of seismic wave propagation can be of great value at many scales, ranging from shallow applications in engineering geophysics to global scale seismology. Accurate modelling of the physics of wave propagation at different scales requires different spatial and temporal discretization and potentially also different numerical methods. We present a new method to dynamically link the waves propagating at these different scales. A finite-difference solver is used on a local grid, whereas the (much) larger background domain is represented by its (precomputed) Green's functions. At each time step of the simulation, the interaction between the events leaving the local domain and the medium outside is calculated using a Kirchhoff-type integral extrapolation and the extrapolated wavefield is applied as a boundary condition to the local domain. This results in a numerically exact hybrid modelling scheme, also after local updates of the model parameters.
Vela-martin Alberto MS Presentation
Thursday, June 9, 2016
Garden 2A, 12:15-12:30

MS Presentation

A High Resolution Hybrid CUDA-MPI Turbulent Channel Code, Alberto Vela-martin (Technical University Madrid, Spain)

Co-Authors: Javier Jiménez (Technical University of Madrid, Spain)

A new high order, high resolution hybrid MPI-CUDA code for the simulation of turbulent channel flow on many distributed GPUs is presented. The code benefits from the use of powerful and efficient heterogeneous architectures with GPUs accelerators. Optimization strategies involving the joint use of GPU and CPU lead to excellent performance. Asynchronous GPU-CPU execution achieves almost complete overlap of computations, memory transfer from/to device/host and MPI communications. A considerable speedup is gained with respect to similar synchronous codes. Test cases and performance results show the code is suitable for the next generation of large direct numerical simulations of turbulence.
Vergara Christian MS Presentation
Wednesday, June 8, 2016
Garden 3A, 16:45-17:00

MS Presentation

Computational Study of the Risk of Restenosis in Coronary Bypasses, Christian Vergara (Politecnico di Milano, Italy)

Co-Authors: Christian Vergara (Politecnico di Milano, Italy); Sonia Ippolito (Ospedale Luigi Sacco Milano, Italy); Roberto Scrofani (Ospedale Luigi Sacco Milano, Italy); Alfio Quarteroni (EPFL, Switzerland)

Coronary artery disease, caused by the build-up of atherosclerotic plaques in coronary vessel walls, is one of the leading causes of death in the world. For high-risk patients, coronary artery bypass graft is the preferred treatment. Despite overall excellent patency rates, bypasses may fail due to restenosis. In this context, we present a computational study of the fluid-dynamics in patient-specific geometries with the aim of investigating a possible relationship between coronary stenosis and graft failure. Firstly, we propose a strategy to prescribe realistic boundary conditions in absence of measured data, based on an extension of Murray's law to provide the flow division at bifurcations in case of stenotic vessels and non-Newtonian blood rheology. Then, we show some results regarding numerical simulations in patients treated with grafts, in which the degree of coronary stenosis is virtually varied to compare the fluid-dynamics in terms of hemodynamic indices potentially involved in restenosis development.
MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:15-14:30

MS Presentation

Simulation of Fluid-Structure Interaction with a Thick Structure via an Extended Finite Element Approach, Christian Vergara (Politecnico di Milano, Italy)

Co-Authors: Luca Formaggia (Politecnico di Milano, Italy); Christian Vergara (Politecnico di Milano, Italy)

In this talk, we present an eXtended Finite Element Method (XFEM) to simulate the fluid-structure interaction arising from a 3D flexible thick structure immersed in a fluid. Both fluid and solid domains are discretized independently by generating two overlapped unstructured meshes. Due to the unfitted nature of the considered meshes, this method avoids the technical problems related to an ALE approach while maintaining an accurate description of the fluid-structure interface. The coupling between the fluid and solid is taken into account by means of a Discontinuous Galerkin approach, which allows to impose the interface conditions. A possible application is the study of the interaction arising between blood and aortic valve leaflets since it is important for understanding its functional behaviour, for developing prosthetic valve devices and for post-surgery feedbacks.
MS Summary

MS Summary

MS01 Advanced Computational Methods for Applications to the Cardiovascular System I, Christian Vergara (Politecnico di Milano, Italy)

Co-Authors: Dominik Obrist (University of Bern, Switzerland), Christian Vergara (Politecnico di Milano, Italy)

Cardiac and Cardiovascular Mathematics represents nowadays a challenging topic in view of the emerging and growing collaborations between clinicians and mathematicians. In this respect, the numerical solution of problems arising in modelling cardiac and systemic phenomena opens new and interesting perspectives which need to be properly addressed. From the cardiac side, a fully integrated heart model represents a complex multiphysics problem, which is in turn composed of several submodels describing cardiac electrophysiology, mechanics, and fluid dynamics. On the system circulation side, although it has been studied for a longer time, several mathematical and numerical aspects still need to be addressed (e.g., tissue remodelling, atherosclerotic plaque formation, aneurysms development, transitional and turbulence phenomena in blood flows). This minisymposium aims at gathering researchers and experts in computational and numerical modelling of the heart and the systemic circulation.
MS Summary

MS Summary

MS07 Advanced Computational Methods for Applications to the Cardiovascular System II, Christian Vergara (Politecnico di Milano, Italy)

Co-Authors: Dominik Obrist (University of Bern, Switzerland), Christian Vergara (Politecnico di Milano, Italy)

Cardiac and Cardiovascular Mathematics represents nowadays a challenging topic in view of the emerging and growing collaborations between clinicians and mathematicians. In this respect, the numerical solution of problems arising in modelling cardiac and systemic phenomena opens new and interesting perspectives which need to be properly addressed. From the cardiac side, a fully integrated heart model represents a complex multiphysics problem, which is in turn composed of several submodels describing cardiac electrophysiology, mechanics, and fluid dynamics. On the system circulation side, although it has been studied for a longer time, several mathematical and numerical aspects still need to be addressed, as e.g. tissue remodelling, atherosclerotic plaque formation, aneurysms development, transitional and turbulence phenomena in blood flows. This minisymposium aims at gathering researchers and experts in computational and numerical modelling of the heart and the systemic circulation.
Verma Siddhartha Contributed Talk
Thursday, June 9, 2016
Garden 3A, 10:50-11:10

Contributed Talk

Propulsive Advantage of Swimming in Unsteady Flows, Siddhartha Verma (ETH Zurich, Switzerland)

Co-Authors: Siddhartha Verma (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

Individual fish swimming in a school encounter vortices generated by the propulsion of upstream members. Experimental and theoretical studies suggest that these hydrodynamic interactions may increase thrust without additional energy expenditure. However, difficulties associated with experimental studies have prevented a systematic quantification of this phenomenon. Using simulations of self-propelled swimmers, we investigate some of the mechanisms by which fish may exploit each others' wake to reduce energy expenditure. We quantify the relative importance of two mechanisms for increasing swimming efficiency: the decrease in relative velocity induced by proximity to wake vortices; and wall/"channelling" effects. Additionally, we conduct simulations of fish swimming in the Karman vortex street behind a static cylinder. This configuration helps us clarify the role of the bow pressure wave, entrainment, and "vortex-surfing" in enhancing propulsive efficiency of trout swimming near obstacles.
Verzicco Roberto MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Roberto Verzicco (PoF, University of Twente & Uniroma2, Netherlands, Italy)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Vessaz Christian MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:45-15:00

MS Presentation

GPU-Accelerated Hydrodynamic Simulation of Hydraulic Turbines Using the Finite Volume Particle Method, Christian Vessaz (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL / LMH, Switzerland); Christian Vessaz (EPFL / LMH, Switzerland); Sebastian Leguizamon (EPFL / LMH, Switzerland); François Avellan (EPFL / LMH, Switzerland)

Performance prediction based on numerical simulations can be very helpful in the design process of hydraulic turbines. The Finite Volume Particle Method (FVPM) is a consistent and conservative particle-based method which inherits interesting features of both Smoothed Particle Hydrodynamics and grid-based Finite Volume Method. This method is particularly well-suited for such simulations thanks to its versatility. SPHEROS is a parallel FVPM solver which has been developed at the EPFL - Laboratory for Hydraulic Machines for simulating Pelton turbines and silt erosion. In order to allow the simulation of industrial-size setups, a GPU version of SPHEROS (GPU-SPHEROS) is being developed in CUDA and features Thrust library to handle complicated structures such as octree. Besides, some highly-optimised kernels are also implemented for both compute-bound and memory-bound algorithms. Comparing the performance of different parts of GPU-SPHEROS and SPHEROS, we achieve a speed-up factor of at least eight.
Contributed Talk
Friday, June 10, 2016
Garden 1BC, 10:15-10:30

Contributed Talk

Thermomechanical Modeling of Impacting Particles on a Metallic Surface for the Erosion Prediction in Hydraulic Turbines, Christian Vessaz (EPFL-LMH, Switzerland)

Co-Authors: Ebrahim Jahanbakhsh (Università della Svizzera italiana, Switzerland); Audrey Maertens (EPFL, Switzerland); Christian Vessaz (EPFL, Switzerland); François Avellan (EPFL, Switzerland)

Erosion damage in hydraulic turbines is a common problem caused by the high-velocity impact of small particles entrained in the fluid. Numerical simulations can be useful to investigate the effect of each governing parameter in this complex phenomenon. The Finite Volume Particle Method is used to simulate the three-dimensional impact of dozens of rigid spherical particles on a metallic surface. The very fine discretization and the overall number of time steps needed to achieve the steady state erosion rate render the problem very expensive, implying the need for high performance computing. In this talk, a comparison of constitutive models is presented, with the aim of assessing the complexity of the thermomechanical modelling required to accurately simulate the impact and subsequent erosion of metals. The importance of strain rate, triaxiality, friction model and thermal effects is discussed.
Villard Laurent Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:15-14:30

Contributed Talk

A Portable Platform for Accelerated PIC Codes and its Application to Multi- and Many Integrated Core Architectures Using Hybrid MPI/OpenMP, Laurent Villard (Ecole Polytechnique Fédérale de Lausanne, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Noé Ohana (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Laurent Villard (EPFL, Switzerland)

With the aim of porting Particle-In-Cell (PIC) codes to modern parallel computers equipped with coprocessors, we have designed a testbed called PIC_ENGINE retaining the key elements of the PIC algorithm as applied to plasma physics simulations. A hybrid OpenMP/MPI implementation is used to explore the potential gain in performance on multi-core CPUs and Many Integrated Core (MIC) coprocessors. A bucket sort is added to increase data locality and a vectorization algorithm is implemented showing an improvement in the overall performance. With the PIC_ENGINE, we show that the hybrid OpenMP/MPI approach allows a performance gain of approximately 60% compared to pure MPI. Furthermore, the sorting and vectorization increase the performance of the most time consuming methods by up to a factor 3.2. Finally, using the same code, hybrid runs are performed on MIC and show similar conclusions. However, due to inefficient vectorization, the overall performance is poor compared to CPU runs.
Contributed Talk
Wednesday, June 8, 2016
Garden 3C, 14:30-14:45

Contributed Talk

Towards Optimization of a Gyrokinetic Particle-in-Cell (PIC) Code on Large Scale Hybrid Architectures, Laurent Villard (Ecole Polytechnique Fédérale de Lausanne, Switzerland)

Co-Authors: Andreas Jocksch (ETH Zurich / CSCS, Switzerland); Emmanuel Lanti (EPFL / Swiss Plasma Center, Switzerland); Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Stephan Brunner (EPFL, Switzerland); Claudio Gheller (ETH Zurich / CSCS, Switzerland); Farah Hariri (European Organization for Nuclear Research, Switzerland); Laurent Villard (EPFL, Switzerland)

Refactoring large legacy codes to exploit the power of new multithreaded devices is not an easy task. For this purpose, we designed a platform embedding simplified basic features of PIC codes. It solves the drift-kinetic equations (first step towards gyrokinetics) in a sheared plasma slab using B-spline finite elements up to fourth order. Multiple levels of parallelism have been implemented using MPI+OpenMP and MPI+OpenACC. It has been shown that sorting particles can lead to performance improvement by increasing data locality and vectorizing the grid memory access. This paper focuses on GPU implementation, which overtakes CPU timing by up to a factor 4 while not requiring a big code rewriting effort. This gain increases with the splines order. Weak and strong scalability tests have been successfully run on GPU-equipped Cray XC30 Piz Daint (CSCS) up to 4,096 nodes. Those performances shall enable advanced studies of turbulent transport in magnetic fusion devices.

W

Wall Wolfgang A. MS Presentation
Wednesday, June 8, 2016
Garden 3A, 15:30-16:00

MS Presentation

Hybridizable Discontinuous Galerkin Approximation of Cardiac Electrophysiology, Wolfgang A. Wall (Institute for Computational Mechanics, Technical University of Munich, Germany)

Co-Authors: Cristóbal Bertoglio (Center for Mathematical Modeling, University of Chile, Chile); Martin Kronbichler (Technical University of Munich, Germany); Wolfgang A. Wall (Technical University of Munich, Germany)

Cardiac electrophysiology simulations are numerically extremely challenging, due to the propagation of the very steep electrochemical wave front during depolarization. Hence, in classical continuous Galerkin (CG) approaches, very small temporal and spacial discretisations are necessary to obtain physiological propagation. Until now, spatial discretisations based on discontinuous methods have received little attention for cardiac electrophysiology simulations. In particular, local discontinuous Galerkin (LDG) or hybridizable discontinuous Galerkin (HDG) methods have not been explored yet. Application of such methods, when taking advantage of their parallelity features, would allow a speed-up of the computations. In this work we provide a detailed comparison among CG, LDG and HDG methods for electrophysiology equations based on the mono-domain model. We also study the effect of the numerical integration of the non-linear ionic current term. Furthermore we plan to show the difference between classic CG methods and HDG methods on large three-dimensional simulations with patient-specific cardiac geometries.
Wayne Brett MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:00-14:30

MS Presentation

The Development of ParaDiS for HCP Crystals, Brett Wayne (Lawrence Livermore National Laboratory, United States of America)

Co-Authors: Sylvie Aubry (Lawrence Livermore National Laboratory, United States of America); Moono Rhee (Lawrence Livermore National Laboratory, United States of America); Brett Wayne (Lawrence Livermore National Laboratory, United States of America); Gregg Hommes (Lawrence Berkeley National Laboratory, United States of America)

The ParaDiS project at LLNL was created to build a scalable massively parallel code for the purpose of predicting evolution of strength and strain hardening and crystalline materials under dynamic loading conditions by directly integrating the elements of dislocation physics. The code has been used by researchers at LLNL and around the world to simulate the behaviour of dislocation networks in a wide variety of applications, from high temperature structural materials, to nuclear materials, to armor materials, to photovoltaic systems. ParaDiS has recently been extended to include a fast analytical algorithm for the computation of forces in anisotropic elastic media, and an augmented set of topological operations to treat the complex core physics of dislocations and other dislocations that routinely appear in HCP metals. The importance and implications of these developments on the engineering properties of HCP metals will be demonstrated in large scale simulations of strain hardening.
Weaver Anthony MS Presentation
Wednesday, June 8, 2016
Garden 3B, 16:30-17:00

MS Presentation

Scalability and Performance of the NEMOVAR Variational Ocean Data Assimilation Software, Anthony Weaver (CERFACS, France)

Co-Authors: Anthony Weaver (CERFACS, France); Magdalena Balmaseda (ECMWF, United Kingdom); Kristian Mogensen (ECMWF, United Kingdom)

Scalability and performance of the variational data assimilation software NEMOVAR for the NEMO ocean model is presented. NEMOVAR is a key component of the ECMWF operational Ocean analysis System 4 (Ocean S4) and future System 5 (Ocean S5). It is designed as a four dimensional variational assimilation (4D-Var) algorithm, which can also support three-dimensional (3D-Var) assimilation, using the First-Guess at Appropriate Time (FGAT) approach. Central to the code performance is the implementation of the correlation operator used for modelling of the background error covariance matrix. In NEMOVAR it is achieved using a diffusion operator. A new implicit formulation of the diffusion operator has been introduced recently which solves the underlying linear system using the Chebyshev iteration. The technique is more flexible and better suited for massively parallel machines than the method currently used operationally at ECMWF, but further improvements will be necessary for the future high-resolution applications.
Weber Bruno MS Presentation
Wednesday, June 8, 2016
Garden 3A, 17:00-17:15

MS Presentation

An Overset Grid Method for Oxygen Transport from Red Blood Cells in Capillary Networks, Bruno Weber (University of Zurich, Switzerland)

Co-Authors: Bruno Weber (University of Zurich, Switzerland); Patrick Jenny (ETH Zurich, Switzerland)

Most oxygen in the blood circulation is carried bound to hemoglobin in red blood cells (RBCs). In capillaries, the oxygen partial pressure (PO2) is affected by the individual RBCs that flow in a single file. We have developed a novel overset grid method for oxygen transport from capillaries to tissue. This approach uses moving grids for RBCs and a fixed one for the blood vessels and the tissue. This combination enables accurate modelling of the intravascular PO2 field and the unloading of oxygen from RBCs. Additionally, our model can account for fluctuations in hematocrit and hemoglobin saturation. Its parallel implementation in OpenFOAM supports three-dimensional tortuous capillary networks. Simulations of oxygen transport in the rodent cerebral cortex have been performed and are used to study the cerebral energy metabolism. Other applications include the investigation of hemoglobin saturation heterogeneity in capillary networks.
Wedi Nils MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:30-14:45

MS Presentation

Towards Exascale Computing with the ECMWF Model, Nils Wedi (ECMWF, United Kingdom)

Co-Authors: Nils Wedi (ECMWF, United Kingdom); George Mozdzynski (ECMWF, United Kingdom); Sami Saarinen (ECMWF, United Kingdom)

The European Centre for Medium-Range Weather Forecasts (ECMWF) is currently investing in a scalability programme that addresses computing and data handling challenges for realizing those scientific advances on future high-performance computing environments that will enhance predictive skill from medium to monthly time scales. A key component of this programme is the European Commission funded project Energy efficient SCalable Algorithms for weather Prediction at Exascale (ESCAPE) that develops numerical building blocks and compute intensive algorithms of the forecast model, applies compute/energy efficiency diagnostics, designs implementations on novel architectures, and performs testing in operational configurations. The talk will report on the progress of the scalability programme with a special focus on ESCAPE.
Wellein Gerhard MS Presentation
Thursday, June 9, 2016
Auditorium C, 15:00-15:30

MS Presentation

GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems, Gerhard Wellein (Friedrich-Alexander University of Erlangen-Nuremberg, Germany)

Co-Authors: Georg Hager (University of Erlangen-Nuremberg, Germany); Gerhard Wellein (University of Erlangen-Nuremberg, Germany)

A significant amount of future exascale-class high performance computer systems are projected to be of heterogeneous nature, featuring "standard" as well as "accelerated" resources. A software infrastructure that claims applicability for such systems must be able to meet their inherent challenges: multiple levels of parallelism, complex topologies, asynchronicity, and abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is an open-source library of building blocks for sparse linear algebra algorithms on current and future large-scale systems. Being built on the "MPI+X" paradigm, it provides truly heterogeneous data parallelism and a light-weight and affinity-aware tasking mechanism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. Important design decisions are described with respect to the challenges posed by modern heterogeneous supercomputers and recent algorithmic developments. Implementation details which are indispensable for achieving high efficiency are pointed out and their necessity is justified by performance measurements or predictions based on performance models.
Wermelinger Fabian Paper
Wednesday, June 8, 2016
Auditorium C, 17:00-17:30

Paper

An Efficient Compressible Multicomponent Flow Solver for Heterogeneous CPU/GPU Architectures, Fabian Wermelinger (ETH Zurich, Switzerland)

Co-Authors: Babak Hejazialhosseini (Cascade Technologies Inc., United States of America); Panagiotis Hadjidoukas (ETH Zurich, Switzerland); Diego Rossinelli (ETH Zurich, Switzerland); Petros Koumoutsakos (ETH Zurich, Switzerland)

We present a solver for three-dimensional compressible multicomponent flow based on the compressible Euler equations. The solver is based on a finite volume scheme for structured grids and advances the solution using an explicit Runge-Kutta time stepper. The numerical scheme requires the computation of the flux divergence based on an approximate Riemann problem. The computation of the divergence quantity is the most expensive task in the algorithm. Our implementation takes advantage of the compute capabilities of heterogeneous CPU/GPU architectures. The computational problem is organized in subdomains small enough to be placed into the GPU memory. The compute intensive stencil scheme is offloaded to the GPU accelerator while advancing the solution in time on the CPU. Our method to implement the stencil scheme on the GPU is not limited to applications in fluid dynamics. The performance of our solver was assessed on Piz Daint, a XC30 supercomputer at CSCS. The GPU code is memory-bound and achieves a per-node performance of 462 Gflop/s, outperforming by 3.2x the multicore-based Gordon Bell winning CUBISM-MPCF solver for the offloaded computation on the same platform. The focus of this work is on the per-node performance of the heterogeneous solver. In addition, we examine the performance of the solver across 4096 compute nodes. We present simulations for the shock-induced collapse of an aligned row of air bubbles submerged in water using 4 billion cells. Results show a final pressure amplification that is 100x stronger than the strength of the initial shock.
Wernli Heini MS Presentation
Thursday, June 9, 2016
Garden 3B, 14:45-15:00

MS Presentation

A Decade-Long European-Scale Convection-Resolving Climate Simulation on GPUs: Computation, Validation and Analyses, Heini Wernli (ETH Zürich, Switzerland)

Co-Authors: Stefan Rüdisühli (ETH Zurich, Switzerland); Nikolina Ban (ETH Zurich, Switzerland); Oliver Fuhrer (MeteoSwiss, Switzerland); Daniel Lüthi (ETH Zurich, Switzerland); Michael Sprenger (ETH Zurich, Switzerland); Heini Wernli (ETH Zurich, Switzerland); Christoph Schär (ETH Zurich, Switzerland)

Climate simulations using horizontal resolution of O(1km) allow to explicitly resolve deep convection. Precipitation processes are then represented much closer to first principles and allow for an improved representation of the water cycle. Due to the large computational costs, climate simulations at such scales were restricted to rather small domains in the past. Here we present results from a decade-long convection-resolving climate simulation covering Europe using a computational mesh of 1,536x1,536x60 grid points. We use a COSMO-model prototype enabled for GPUs. The results illustrate how the approach allows representing the interactions between atmospheric circulations at scales ranging from 1,000 to 10 km. We discuss the performance of the convection-resolving climate modelling approach and thereby specifically focus on the improved representation of summer convection on the continental scale. Furthermore we demonstrate the potential of online analyses of these simulations for assembling detailed climatologies of extratropical cyclones, fronts and propagating convective systems.
Wersal Christoph Poster

Poster

PHY-03 Parallelization on a Hybrid Architecture of GBS, a Simulation Code for Plasma Turbulence at the Edge of Fusion Devices, Christoph Wersal (École polytechnique fédérale de Lausanne, Switzerland)

Co-Authors: Trach-Minh Tran (EPFL / Swiss Plasma Center, Switzerland); Patrick Emonts (EPFL / Swiss Plasma Center, Switzerland); Federico David Halpern (EPFL / Swiss Plasma Center, Switzerland); Rogério Jorge (EPFL / Swiss Plasma Center, Switzerland); Jorge Morales (EPFL / Swiss Plasma Center, Switzerland); Paola Paruta (EPFL / Swiss Plasma Center, Switzerland); Paolo Ricci (EPFL / Swiss Plasma Center, Switzerland); Fabio Riva (EPFL / Swiss Plasma Center, Switzerland)

We present recent developments of GBS, a simulation code used to evolve plasma turbulence in the edge of fusion devices. GBS solves a set of 3D fluid equations, the Poisson and the Ampere equation, and a kinetic equation for the neutral atoms. Investigations carried out with GBS have significantly advanced our understanding of the plasma dynamics at the edge of fusion devices. For example, GBS simulations allowed the identification of the turbulent regimes and the saturation mechanisms of the linearly unstable modes. In GBS, a 3D Cartesian MPI communicator is employed, leading to excellent parallel scalability up to 8192 cores. To efficiently exploit many-core and hybrid architectures, new schemes using MPI+OpenMP and MPI+OpenACC have been recently implemented. We show the implementation of the new parallelization schemes, their scalability, and their efficiency. The new parallelization allows the efficient use of advanced hybrid supercomputers, such as Piz Daint at CSCS.
Widera Rene MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Wild Martin Poster

Poster

CLI-03 From Code to Climate: Adjusting Free Parameters in a Global Climate Model, Martin Wild (ETH Zurich, Institute for Atmospheric and Climate Science, Switzerland)

Co-Authors: Martin Wild (ETH Zurich, Switzerland)

The discretization of global climate models (GCMs) is too coarse to resolve a number of climate relevant processes. For example, the deep convection associated with tropical thunderstorms is of key relevance for the global atmospheric circulation, yet it enters GCMs only via sub-grid-scale parameterization of thunderstorms. Typically, such parameterizations come with some free parameters that need adjusting in order to obtain a 'physically meaningful climate', a process referred to as 'model tuning'. We illustrate this process at the example of MPI-ESM-HAM, the Max Planck Earth System Model (MPI-ESM) coupled to the Hamburg Aerosol Module (HAM) and discuss how we cope with three associated computational challenges: the high dimensionality of the parameter space, the substantial year-to-year variability of the model climate as compared to the long term mean climate, and response time scales to changes in tuning parameters that range from under a year to several centuries or longer.
Winschel Viktor MS Presentation
Thursday, June 9, 2016
Garden 2A, 15:30-15:45

MS Presentation

Uncertainty Quantification and Global Sensitivity Analysis for Economic Models, Viktor Winschel (ETH Zurich, Switzerland)

Co-Authors: Viktor Winschel (ETH Zurich, Switzerland); Stefano Marelli (ETH Zurich, Switzerland); Bruno Sudret (ETH Zurich, Switzerland)

We present a method for global sensitivity analysis of the outcomes of an economic model with respect to their parameters. Traditional sensitivity analyses, like comparative statics, scenario and robustness analysis are local and depend on the chosen combination of parameter values. Our global approach specifies a distribution for each parameter and approximates the outcomes as a polynomial of parameters. In contrast to local analyses, the global sensitivity analysis takes into account non-linearities and interactions. Using the polynomial, we compute the distribution of outcomes and a variance decomposition called Sobol' indices. We obtain an importance ranking of the parameters and their interactions, which can guide calibration exercises and model development. We compare the local to the global approach for the mean and variance of production in a canonical real business cycle model. We find an interesting separation result: for mean production, only capital share, leisure substitution rate, and depreciation rate matter.
Witek Jagna Poster

Poster

LS-06 Structural and Dynamic Properties of Cyclosporin A: Molecular Dynamics and Markov State Modelling, Jagna Witek (Laboratory of Physical Chemistry, ETH Zürich, Switzerland)

Co-Authors: Bettina Keller (Free University of Berlin, Germany); Sereina Z. Riniker (ETH Zurich, Switzerland)

The membrane permeability of cyclic peptides is likely influenced by the conformational behavior of these compounds in polar and apolar environments. The size and complexity of peptides often limits their bioavailability, but there are known examples of peptide natural products such as cyclosporin A (CsA) that can cross cell membranes by passive diffusion. The crystal of CsA structure shows a "closed" conformation with four intramolecular hydrogen bonds. When binding to its target cyclophilin, CsA adopts an "open" conformation without intramolecular hydrogen bonds. In this study, we attempted to sample exhaustively the conformational space of CsA in chloroform and in water by molecular dynamics simulations in order to rationalize the good membrane permeability of CsA observed experimentally. From 10 μs molecular dynamics simulations in each solvent, Markov state models were constructed to characterize the metastable conformational states. The conformational landscapes in both solvents show significant overlap, but also clearly distinct features.
Wittmann Markus Poster

Poster

CSM-02 A Novel Approach for Efficient Stencil Assembly in Curved Geometries, Markus Wittmann (Erlangen Regional Computing Center (RRZE), FAU Erlangen-Nürnberg, Germany)

Co-Authors: Marcus Mohr (Ludwig Maximilian University of Munich, Germany); Ulrich Rüde (University of Erlangen-Nuremberg, Germany); Markus Wittmann (FAU Erlangen-Nürnberg / Erlangen Regional Computing Center (RRZE), Germany); Barbara Wohlmuth (Technical University of Munich, Germany)

In many scientific and engineering applications one has to deal with curved geometries. Such domains can accurately be approximated e.g., by unstructured grids and iso-parametric finite elements. We present a novel approach here that is well-suited to our concept of hierarchical hybrid grids (HHG). The latter was shown to achieve excellent performance and scalability even for extreme numbers of DOFs by a matrix-free implementation and exploiting regularity of access patterns. In our approach FE stencils are not assembled exactly, but approximated by low order polynomials and evaluated with an efficient incremental algorithm. We demonstrate the accuracy achieved as well as the computational efficiency using our prototypical HHG-based mantle convection solver which operates on non-nested triangulations of a thick spherical shell. The implementation of our scheme is based on a systematic node-level performance analysis and maintains the high efficiency of the original HHG.
Wohlmuth Barbara Poster

Poster

CSM-02 A Novel Approach for Efficient Stencil Assembly in Curved Geometries, Barbara Wohlmuth (Institute for Numerical Mathematics (M2), Technische Universität München, Germany)

Co-Authors: Marcus Mohr (Ludwig Maximilian University of Munich, Germany); Ulrich Rüde (University of Erlangen-Nuremberg, Germany); Markus Wittmann (FAU Erlangen-Nürnberg / Erlangen Regional Computing Center (RRZE), Germany); Barbara Wohlmuth (Technical University of Munich, Germany)

In many scientific and engineering applications one has to deal with curved geometries. Such domains can accurately be approximated e.g., by unstructured grids and iso-parametric finite elements. We present a novel approach here that is well-suited to our concept of hierarchical hybrid grids (HHG). The latter was shown to achieve excellent performance and scalability even for extreme numbers of DOFs by a matrix-free implementation and exploiting regularity of access patterns. In our approach FE stencils are not assembled exactly, but approximated by low order polynomials and evaluated with an efficient incremental algorithm. We demonstrate the accuracy achieved as well as the computational efficiency using our prototypical HHG-based mantle convection solver which operates on non-nested triangulations of a thick spherical shell. The implementation of our scheme is based on a systematic node-level performance analysis and maintains the high efficiency of the original HHG.
Worpitz Benjamin MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Benjamin Worpitz (Citrix Systems GmbH, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Wu Zhaoxuan MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:40-16:00

MS Presentation

Atomistic Modelings of Dislocation Cross-Slips in HCP Metals, Zhaoxuan Wu (Institute of Mechanical Engineering, Ecole Polytechnique Federale de Lausanne, Switzerland)

Co-Authors: William Curtin (EPFL, Switzerland)

HCP metals such as Mg, Ti and Zr are a class of lightweight and/or highly durable metals with critical structural applications in the automotive, aerospace and nuclear industries. However, the fundamental mechanisms of deformation, strengthening and ductility in them are not well-understood, resulting in significant challenges in their plasticity models at all scales. We present the dislocation cross-slips in Mg using a DFT-validated interatomic potential and very large scale NEB calculations on HPC systems. We reveal a unique dislocation cross-slip mechanism and quantify the cross-slip energy barrier and its stress-dependence, which leads to tension-compression asymmetry and a switch in absolute stability of slip planes. All these are generic to HCP metals but very different from those well-established for cubic metals. Our results provide mechanistic insights into the cross-slip behaviour, rationalize the pyramidal I/II slip stability and enable the prediction of slip trends across the family of HCP metals.
MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:30-15:00

MS Presentation

Deformation and Failure Behavior of Metallic Nanostructures, Zhaoxuan Wu (Institute of Mechanical Engineering, Ecole Polytechnique Federale de Lausanne, Switzerland)

Co-Authors: Zhaoxuan Wu (Institute of High Performance Computing, A*STAR, Singapore); Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore); Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore); Wendy Gu (California Institute of Technology, United States of America); Julia Greer (California Institute of Technology, United States of America); David Srolovitz (University of Pennsylvania, United States of America)

The reliability of metallic nanostructures, such as, nanowires and thin films, is often dictated by their mechanical failures. An in-depth understanding in the effects of their intrinsic factors such as grain boundaries and surface roughness and extrinsic factors such as size, shapes and man-made notches on their plastic deformation mechanisms and failure patterns is of great importance in fabricating these nanostructures with high reliability. In this talk, we will first briefly review various deformation mechanisms and failure patterns in literature and highlight some of critical issues that currently are under active research. We will then report our recent progresses in the study of the effects of intrinsic factors, such as grain boundaries and dislocations and extrinsic factors such as sizes, shapes and man-made notches, on the plasticity and failure of metallic nanostructures using both mechanical testing and large-scale molecular dynamics simulations.
Wüst Thomas MS Presentation
Thursday, June 9, 2016
Garden 3A, 15:00-15:15

MS Presentation

Enhancing the Computational Capabilities for Biologists: Genomic Data Analysis Services at ETH Zurich, Thomas Wüst (ID Scientific IT Services, ETH Zurich, Switzerland)

Co-Authors: Thomas Wüst (ETH Zurich, Switzerland); Bernd Rinn (ETH Zurich, Switzerland)

Genomics-based biological research (e.g., next-generation sequencing) generates increasingly amounts of data, which need dedicated high-performance computing (HPC) resources to be analysed efficiently. However, the specialization in both areas (namely, genomics and HPC) makes it increasingly challenging to bring the two fields together and to leverage the usage of available computational resources by biologists. The mission of Scientific IT Services (SIS) of ETH Zurich is to bridge this gap and to provide client-tailored solutions for big data genomics. In this presentation, we illustrate this need and our approach by selected examples ranging from the design of automated, high-throughput NGS analysis workflows through addressing the biology "software stack jungle" to scientific IT education for biologists. Throughout the talk, we emphasize the importance of scientific data management, consulting needs and community building for using HPC in biological research.

X

Xenarios Ioannis Poster

Poster

LS-08 The UniProt SPARQL Endpoint: 21 Billion Triples in Production, Ioannis Xenarios (SIB Swiss Institute of Bioinformatics, Switzerland)

Co-Authors: Sebastien Gehant (Swiss Institute of Bioinformatics, Switzerland); Thierry Lombardot (Swiss Institute of Bioinformatics, Switzerland); Lydie Bougueleret (Swiss Institute of Bioinformatics, Switzerland); Ioannis Xenarios (Swiss Institute of Bioinformatics, Switzerland); Nicole Redaschi (Swiss Institute of Bioinformatics, Switzerland)

The UniProt knowledgebase is a leading resource of protein sequences and functional information whose centerpiece is the expert-curated Swiss-Prot section. UniProt data is accessible at www.uniprot.org (via a user-friendly interface and a REST API) and at sparql.uniprot.org, a public SPARQL endpoint hosted and maintained by the Vital-IT and Swiss-Prot groups of SIB. With 21 billion RDF triples it is the largest free to use graph database in the sciences. SPARQL allows scientists to perform complex queries within UniProt and across datasets located on remote SPARQL endpoints. It provides a free data integration solution for users who cannot afford to create custom data warehouses, at a cost for the service providers. Here we discuss the challenges in maintaining the UniProt SPARQL endpoint, which is updated monthly in sync with the UniProt data releases.

Y

Yang Yantao MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Yantao Yang (PoF, University of Twente, Netherlands)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Yashiro Hisashi Paper
Wednesday, June 8, 2016
Auditorium C, 14:00-14:30

Paper

Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5, Hisashi Yashiro (RIKEN AICS, Japan)

Co-Authors: Masaaki Terai (RIKEN / Advanced Institute for Computational Science, Japan); Ryuji Yoshida (RIKEN / Advanced Institute for Computational Science, Japan); Shin-ichi Iga (RIKEN / Advanced Institute for Computational Science, Japan); Kazuo Minami (RIKEN / Advanced Institute for Computational Science, Japan); Hirofumi Tomita (RIKEN / Advanced Institute for Computational Science, Japan)

We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
Yoshida Ryuji Paper
Wednesday, June 8, 2016
Auditorium C, 14:00-14:30

Paper

Performance Analysis and Optimization of Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on the K Computer and TSUBAME2.5, Ryuji Yoshida (RIKEN AICS, Japan)

Co-Authors: Masaaki Terai (RIKEN / Advanced Institute for Computational Science, Japan); Ryuji Yoshida (RIKEN / Advanced Institute for Computational Science, Japan); Shin-ichi Iga (RIKEN / Advanced Institute for Computational Science, Japan); Kazuo Minami (RIKEN / Advanced Institute for Computational Science, Japan); Hirofumi Tomita (RIKEN / Advanced Institute for Computational Science, Japan)

We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.

Z

Zaleski Stephane MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:00-14:30

MS Presentation

Numerical Simulation of Flows with Sharp Interfaces by the Volume-Of-Fluid Method, Stephane Zaleski (UPMC - ?'Alembert, France)

Co-Authors:

We discuss recent developments in the Volume-Of-Fluid (VOF) methods, such as the height function method for the approximation of the geometry of the interface, the balanced-force surface tension method, and the methods that conserve mass and momentum at machine accuracy. Applications at high Reynolds number, such as high speed liquid-gas flows, and low Reynolds and low Capillary numbers, are discussed. Problems of engineering and physical interest, such as jet atomisation or flow in porous media are investigated with these methods as will be shown.
Zalewski Marcin Paper
Thursday, June 9, 2016
Auditorium C, 12:00-12:30

Paper

Context Matters: Distributed Graph Algorithms and Runtime Systems, Marcin Zalewski (Indiana University, United States of America)

Co-Authors: Jesun Sahariar Firoz (Indiana University, United States of America); Thejaka Amila Kanewala (Indiana University, United States of America); Marcin Zalewski (Indiana University, United States of America); Martina Barnas (Indiana University, United States of America)

The increasing complexity of the software/hardware stack of modern supercomputers makes understanding the performance of the modern massive-scale codes difficult. Distributed graph algorithms (DGAs) are at the forefront of that complexity, pushing the envelope with their massive irregularity and data dependency. We analyse the existing body of research on DGAs to assess how technical contributions are linked to experimental performance results in the field. We distinguish algorithm-level contributions related to graph problems from "runtime-level" concerns related to communication, scheduling, and other low-level features necessary to make distributed algorithms work. We show that the runtime is an integral part of DGAs' experimental results, but it is often ignored by the authors in favor of algorithm-level contributions. We argue that a DGA can only be fully understood as a combination of these two aspects and that detailed reporting of runtime details must become an integral part of scientific standard in the field if results are to be truly understandable and interpretable. Based on our analysis of the field, we provide a template for reporting the runtime details of DGA results, and we further motivate the importance of these details by discussing in detail how seemingly minor runtime changes can make or break a DGA.
Zampini Stefano Paper
Wednesday, June 8, 2016
Auditorium C, 16:00-16:30

Paper

On the Robustness and Prospects of Adaptive BDDC Methods for Finite Element Discretizations of Elliptic PDEs with High-Contrast Coefficients, Stefano Zampini (KAUST, Saudi Arabia)

Co-Authors: David E. Keyes (King Abdullah University of Science and Technology, Saudi Arabia)

Balancing Domain Decomposition by Constraints (BDDC) methods have proven to be powerful preconditioners for large and sparse linear systems arising from the ﬁnite element discretization of elliptic PDEs. Condition number bounds can be theoretically established that are independent of the number of subdomains of the decomposition.

The core of the methods resides in the design of a larger and partially discontinuous ﬁnite element space that allows for fast application of the preconditioner, where Cholesky factorizations of the subdomain ﬁnite element problems are additively combined with a coarse, global solver. Multilevel and highly-scalable algorithms can be obtained by replacing the coarse Cholesky solver with a coarse BDDC preconditioner.

BDDC methods have the remarkable ability to control the condition number, since the coarse space of the preconditioner can be adaptively enriched at the cost of solving local eigenproblems. The proper identiﬁcation of these eigenproblems extends the robustness of the methods to any heterogeneity in the distribution of the coefficients of the PDEs, not only when the coefficients jumps align with the sub-domain boundaries or when the high contrast regions are conﬁned to lie in the interior of the subdomains. The specific adaptive technique considered in this paper does not depend upon any interaction of discretization and partition; it relies purely on algebraic operations.

Coarse space adaptation in BDDC methods has attractive algorithmic properties, since the technique enhances the concurrency and the arithmetic intensity of the preconditioning step of the sparse implicit solver with the aim of controlling the number of iterations of the Krylov method in a black-box fashion, thus reducing the number of global synchronization steps needed by the iterative solver; data movement and memory bound kernels in the solve phase can be thus limited at the expense of extra local ﬂops during the setup of the preconditioner.

This paper presents an exposition of the BDDC algorithm that identiﬁes the current computational bottlenecks that could prevent it from being competitive with other solvers, and proposes solutions in anticipation of exascale architectures. Furthermore, the discussion aims to give interested practitioners sufficient insights to decide whether or not to pursue BDDC in their applications.

In addition, the work presents novel numerical results using the distributed memory implementation of BDDC in the PETSc library for vector ﬁeld problems arising in the context of porous media ﬂows and electromagnetic modeling; the results provide evidence of the robustness of these methods for highly heterogenous problems and non-conforming discretizations.
Zenker Erik MS Presentation
Wednesday, June 8, 2016
Garden 3C, 14:00-14:15

MS Presentation

Interactive Plasma Simulations on Next Generation Supercomputers for Everybody, Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, TU Dresden, Germany)

Co-Authors: Rene Widera (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Erik Zenker (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Benjamin Worpitz (Citrix Systems GmbH, Germany); Heiko Burau (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Grund (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Marco Garten (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Carlchristian Eckert (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Alexander Debus (Helmholtz-Zentrum Dresden-Rossendorf, Germany); Michael Bussmann (Helmholtz-Zentrum Dresden-Rossendorf, Germany)

We present the open-source ecosystem around the reportedly fastest particle-in-cell code in the world (sustained Flop/s), PIConGPU. Designed for modern clusters powered by manycore hardware, we motivate that HPC plasma simulations should be able to estimate their systematic and random error (e.g., by varying solvers and initial conditions). Our approach starts with an open-source, anytime fork-able development cycle as the basis for scrutable and reproducible simulations. To promote interoperability, we develop and propagate an open, self-describing, file-format agnostic data-markup (openPMD) that is suitable for extreme I/O load and in-situ processing, demonstrated in a live simulation. PIConGPU is build on top of C++ meta-programming libraries, providing single-source kernel acceleration (alpaka) to work asynchronously on distributed data containers (PMacc). Using compile-time optimisation techniques, we show that particle-mesh methods can be implemented for arbitrary high-performance hardware (GPGPUs, CPUs, OpenPOWER, Xeon Phi) featuring solver agility without negative implications on maintenance (rewrite) or runtime performance.
Zhang Yong-Wei MS Presentation
Thursday, June 9, 2016
Garden 2BC, 14:30-15:00

MS Presentation

Deformation and Failure Behavior of Metallic Nanostructures, Yong-Wei Zhang (Institute of High Performance Computing, A*STAR, Singapore, Singapore)

Co-Authors: Zhaoxuan Wu (Institute of High Performance Computing, A*STAR, Singapore); Mehdi Jafary-Zadeh (Institute of High Performance Computing, A*STAR, Singapore); Mark Jhon (Institute of High Performance Computing, A*STAR, Singapore); Wendy Gu (California Institute of Technology, United States of America); Julia Greer (California Institute of Technology, United States of America); David Srolovitz (University of Pennsylvania, United States of America)

The reliability of metallic nanostructures, such as, nanowires and thin films, is often dictated by their mechanical failures. An in-depth understanding in the effects of their intrinsic factors such as grain boundaries and surface roughness and extrinsic factors such as size, shapes and man-made notches on their plastic deformation mechanisms and failure patterns is of great importance in fabricating these nanostructures with high reliability. In this talk, we will first briefly review various deformation mechanisms and failure patterns in literature and highlight some of critical issues that currently are under active research. We will then report our recent progresses in the study of the effects of intrinsic factors, such as grain boundaries and dislocations and extrinsic factors such as sizes, shapes and man-made notches, on the plasticity and failure of metallic nanostructures using both mechanical testing and large-scale molecular dynamics simulations.
Zheng Zebang MS Presentation
Thursday, June 9, 2016
Garden 2BC, 15:20-15:40

MS Presentation

Multiscale Modelling of Dwell Fatigue in Polycrystalline Titanium Alloys, Zebang Zheng (Imperial College London, United Kingdom)

Co-Authors: Daniel Balint (Imperial College London, United Kingdom); Fionn Dunne (Imperial College London, United Kingdom)

Titanium alloys are used for manufacturing highly stressed components of gas turbine engine such as discs and blades due to their low density, excellent corrosion resistance and high fatigue strength. However, it has been reported that these alloys exhibit a significant fatigue life reduction, called dwell debit, under cyclic loading that includes a hold at the peak stress. In this study, a rate-dependent crystal plasticity framework was first used to reproduce the experimentally observed macroscopic response of Ti624x (x = 2 and 6) alloys under low-cycle fatigue and low-cycle dwell fatigue loading, which enabled relevant material constants for the two alloys to be determined. These were then utilized in a discrete dislocation plasticity model using the same thermally activated rate controlling mechanism to examine the dwell behaviour of the alloys.
Zhu Xiaojue MS Presentation
Wednesday, June 8, 2016
Garden 2A, 14:30-14:45

MS Presentation

AFiD-GPU: A Versatile Navier-Stokes Solver for Turbulent Flows, Xiaojue Zhu (University of Twente, Netherlands)

Co-Authors: Richard J.A.M. Stevens (University of Twente, Netherlands); Everett Phillips (NVIDIA, United States of America); Vamsi Spandan (University of Twente, Netherlands); John Donners (SURFsara, Netherlands); Rodolfo Ostilla-Monico (Harvard University, United States of America); Massimiliano Fatica (NVIDIA, United States of America); Yantao Yang (University of Twente, Netherlands); Detlef Lohse (University of Twente, Netherlands); Roberto Verzicco (University of Twente & Università degli Studi di Roma "Tor Vergata, Netherlands, Italy)

The AFiD code, an open source solver for the Navier-Stokes equations (www.afid.eu), has been ported to GPU clusters to tackle large-scale wall bounded turbulent flow simulations. The GPU port has been carried out in CUDA Fortran with the extensive use of kernel loop directives (CUF kernels) in order to have a source code as close as possible to the original CPU version. Just few routines have been manually written. On Piz-Daint (CSCS), the current GPU version can solve a 2048x3072x3072 mesh on 640 K20x GPUs in 2.4s per time step, while with 2048 GPUs, we measured 0.89s per time step. Thanks to the increased performance and efficient use of memory, this tool allows for simulations in a range parameter that is unprecedented in Rayleigh-Benard convection.
Zic Mario MS Presentation
Thursday, June 9, 2016
Garden 1BC, 15:40-16:00

MS Presentation

The Long Way to the Discovery of New Magnets Made it Short, Mario Zic (Trinity College, Ireland)

Co-Authors:

Magnetic materials underpin many modern technologies, but their development is a long and often unpredictable process, and only about two dozen feature in mainstream applications. Here we describe a systematic pathway to the discovery of novel magnetic materials, presenting an unprecedented throughput and discovery speed. An extensive electronic structures library of Heusler alloys is filtered for those alloys displaying magnetic order and thermodynamical stability. A full stability analysis for intermetallic Heuslers made only of transition metals (36,540 prototypes) results in 248 thermodynamically stable but only 20 magnetic. The magnetic ordering temperature, Tc, is then estimated by a regression calibrated on the experimental values of about 60 known compounds. As a final validation we have grown two new magnets: Co2MnTi, displays a remarkably high Tc in perfect agreement with the predictions, and Mn2PtPd is a complex antiferromagnet. Our work paves the way for large-scale high-speed design of novel magnetic materials.
Zolfaghari Hadi Poster

Poster

LS-03 GPU-Accelerated Immersed Boundary Method with CUDA for the Efficient Simulation of Biomedical Fluid-Structure Interaction, Hadi Zolfaghari (University of Bern, Switzerland)

Co-Authors: Barna Errol Mario Becsek (University of Bern, Switzerland); Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

Immersed boundary methods have become the most usable and useful tools for simulation of biomedical fluid-structure interaction, e.g., in the aortic valve of human heart. In such problems, complex geometry and motion of the soft tissue impose significant computational cost for bodyfitted-mesh methods. Resorting to a fixed Eulerian grid for the flow simulation along with the immersed boundary method to model the interaction with the soft tissue eliminates the expensive mesh generation and updating costs. Nevertheless, the computational cost for the geometry operations including adaptive search algorithms are still significant. Herein, we implemented the immersed boundary kernels with CUDA to be transferred and executed on thousands of parallel threads on the general purpose GPU. Host-device memory optimisation along with optimal usage of GPU multiprocessors results in a boosted performance in fluid-structure interaction simulations.
MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:00-14:15

MS Presentation

FD/FEM Coupling with the Immersed Boundary Method for the Simulation of Aortic Heart Valves, Hadi Zolfaghari (University of Bern, Switzerland)

Co-Authors: Maria Giuseppina Chiara Nestola (Università della Svizzera italiana, Switzerland); Hadi Zolfaghari (University of Bern, Switzerland); Rolf Krause (Università della Svizzera italiana, Switzerland); Dominik Obrist (University of Bern, Switzerland)

The ever-increasing available computational power allows for solving more complex physical problems spanning multiple physical domains. We present a numerical tool for simulating fluid-structure interaction between blood flow and the soft tissue of heart valves. Using the basic concept of the Immersed Boundary Method, the interaction between the two physical domains (flow and structure) does not require mesh manipulation. We solve the governing equations of the fluid and the structure with domain-specific finite difference and finite element discretisations, respectively. We use a massively parallel algorithmic framework for handling the L2-projection transfer between the loosely coupled highly parallel solvers for fluid and solid. Our tool builds on a well-established and proven Navier-Stokes solver and a novel method for solving non-linear continuum solid mechanics.
Zonca Stefano MS Presentation
Wednesday, June 8, 2016
Garden 3A, 14:15-14:30

MS Presentation

Simulation of Fluid-Structure Interaction with a Thick Structure via an Extended Finite Element Approach, Stefano Zonca (Politecnico di Milano, Italy)

Co-Authors: Luca Formaggia (Politecnico di Milano, Italy); Christian Vergara (Politecnico di Milano, Italy)

In this talk, we present an eXtended Finite Element Method (XFEM) to simulate the fluid-structure interaction arising from a 3D flexible thick structure immersed in a fluid. Both fluid and solid domains are discretized independently by generating two overlapped unstructured meshes. Due to the unfitted nature of the considered meshes, this method avoids the technical problems related to an ALE approach while maintaining an accurate description of the fluid-structure interface. The coupling between the fluid and solid is taken into account by means of a Discontinuous Galerkin approach, which allows to impose the interface conditions. A possible application is the study of the interaction arising between blood and aortic valve leaflets since it is important for understanding its functional behaviour, for developing prosthetic valve devices and for post-surgery feedbacks.