You are on page 1of 10

Available online at www.sciencedirect.

com

Procedia Computer Science 18 (2013) 10 19

2013 International Conference on Computational Science

High performance computing in biomedical applications


S. Bastrakova, I. Meyerova,*, V. Gergela, A. Gonoskovb, A. Gorshkova,b, E. Efimenkob, M. Ivanchenkoa, M. Kirillinb, A. Malovaa, G. Osipova, V. Petrova, I. Surmina, A. Vildemanova
b a Lobachevsky State University of Nizhni Novgorod, 23 Prospekt Gagarina, 603950, Nizhni Novgorod, Russia Institute of Applied Physics of the Russian Academy of Sciences, 46 Ul'yanov Street , 603950, Nizhny Novgorod , Russia

Abstract This paper presents the results of the development of a biomedical software system at the Nizhni Novgorod State University High Performance Computing Competence Center. We consider four main fields: plasma simulation, heart activity simulation, brain sensing simulation, molecular dynamics simulation. The software system is aimed at large-scale simulation on cluster systems with high efficiency and scalability. We demonstrate current results of the numerical simulation, analyze performance and propose the ways to improve efficiency. 3 The Authors. Published by Elsevier B.V. 2013 Selection and peer review under responsibility of the organizers of the International Conference on Computational and/or peer-review under responsibility of the organizers of 2013 the 2013 International Conference on Computational Science Science
Keywords: scientific computing; large-scale problems; highly scalable software; software for heterogeneous clusters; GPU computing; plasma simulation; heart activity simulation; brain sensing simulation; molecular dynamics simulation

1. Introduction One of the research areas of the Nizhni Novgorod State University High Performance Computing Competence Center [1] is the development of a unified software infrastructure for solving applied biomedical tasks on modern high performance systems. Biomedicine is one of the fast growing fields in terms of utilizing high performance computing. A key limiting factor for HPC in this field is the lack of software solutions for modeling real-world problems using new approaches in biophysics, pharmaceutics and physics of medical instrument design. The main goal of the project is to establish collaboration between life sciences researchers and software engineers with HPC background. This allows elaboration of the common methodology of data processing (e.g.

* Corresponding author. Tel.: +7-831-462-33-56; fax: +7-831-462-30-85. E-mail address: meerov@vmk.unn.ru

1877-0509 2013 The Authors. Published by Elsevier B.V. Selection and peer review under responsibility of the organizers of the 2013 International Conference on Computational Science doi:10.1016/j.procs.2013.05.164

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

11

visualization) and shared usage of mathematical kernels implementations, thus resulting in rapid development of the software capable of utilizing modern high performance systems. The present paper includes four sections related to research in simulation of plasma, heart activity, brain sensing, and molecular dynamics, respectively. Each section describes the particular problem together with the solution method and provides experimental justification. Common trends are discussed in conclusions section. 2. Plasma simulation Simulation of plasma dynamics and interaction of high intensity laser pulses with various targets in particular is one of the currently high-demand areas of computational physics. It has various biomedical applications: creation of compact sources for hadron therapy for treating oncological diseases, creation of shortlived isotope factories for bioimaging, and development of tools for research on intramolecular and intraatomic processes. Due to the high degree of nonlinearity and geometric complexity of the problem, plasma dynamics research is often based on simulation with the Particle-in-Cell (PIC) method [2]. The Particle-in-Cell method is based on a self-consistent mathematical model representing plasma as an ensemble of negatively charged electrons and positively charged ions. The entire particle ensemble is represented by a smaller amount of so-called super-particles that have the same charge-to-mass ratio as the real particles, which results in equivalent plasma dynamics. The method implies numerical solving the equations of particle motion together with the Maxwell's equations on a discrete grid. The Particle-in-Cell method operates on two sets of data: each particle of a particular type is characterized by its position and velocity (and the type determines the mass and charge), field and current density values are defined on a uniform space grid. Solving some practical problems requires simulation of 10 9 and more particles and 108 grid cells thus demanding hundreds or thousands of nodes of a cluster system. As long as more and more supercomputers share performance among CPUs and GPUs, there is a growing interest in high-performance implementation of the Particle-in-Cell algorithm for heterogeneous systems along with traditional cluster systems. PICADOR [3] is a tool for large-scale three-dimensional Particle-in-Cell plasma simulation on heterogeneous cluster systems. It is aimed at research of high intensity laser applications including laser-driven particle acceleration and generation of radiation with tailored properties. Each time step comprises four stages: field update, field interpolation, particle push, and current deposition. We use the finite-difference time-domain method [4] to update the grid values of the electric and magnetic field . During the particle push stage the field values are interpolated to compute the corresponding force; for each particle only values in eight nearest grid nodes are used. Boris The current deposition stage implies computing the method is used current density based on the positions and velocities of the particles; each particle contributes to the current density values in eight nearest grid nodes. The final current density values are determined by summarizing contributions of all particles and are used during the next field update, thus closing the loop. PICADOR runs on cluster systems with one or several CPUs and GPUs per node using MPI for data exchanges and OpenMP / OpenCL in each process. The problem is decomposed according to a territorial principle: each MPI process handles a subarea keeping track of the field values and particles located in the corresponding part of the simulation area. To keep all domain information relevant, the data referring to the currents, fields and particles are exchanged. All MPI exchanges occur only between processes handling neighboring domains. Implementation details are given in [3]. To analyze the accuracy of the implementation we performed simulation of laser wakefield acceleration with the parameters given in [5] using 1024 CPU cores. The resulting plots of the electric field in direction of laser polarization (left) and in the direction of laser propagation (right) are depicted in Fig. 1. Our results are in good agreement with the corresponding results of several widely used Particle-in-Cell codes compared in [5] (Fig. 1 (c) and (d)). The plots are done with MathGL library [6] for scientific data visualization.

12

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

Fig. 1. Results of simulation of laser wakefield acceleration with parameters given in [5] using 1024 CPU cores. Left: electric field in direction of laser polarization. Right: electric field in the direction of laser propagation.

Fig 2. Computational time for each process on simulation of laser compression using 512 CPU cores. Left: data for uniform decomposition. Right: data for static rectilinear decomposition. Usage of rectilinear decomposition results in 2.5 times speedup.

As reported in [3] PICADOR strong scaling efficiency is 85% on 2048 CPU cores relative to 512. The weak scaling efficiency is 90% on 2048 CPU cores relative to 512 cores and 64% on 512 NVIDIA Tesla X2070 GPUs (229 376 CUDA-cores) relative to 16 GPUs. Execution on one NVIDIA Tesla X2070 GPU achieves 3 4 times speedup in comparison to 2x Intel Xeon L5630 (total 8 CPU cores). Some problems require simulation with highly nonuniform particle distribution. In this case uniform decomposition of computational area between processes results in overall poor performance. A way to address this obstacle is usage of different decomposition schemes. We use static rectilinear partitioning scheme [7] with sum of particles number and grid nodes number as a workload estimation. As an example of simulation with nonuniform particle density we consider simulation of laser compression using 512 x 512 x 512 grid, 21 million particles, and 10000 time steps using 512 MPI processes. Fig. 2 presents comparison of computational time for uniform decomposition (left) and rectilinear decomposition (right), showing distribution of computational time among main stages of the algorithm for each process. According to these results, usage of rectilinear decomposition improves efficiency by about 2.5 times. It is worthy of notice, that some problems require not just static decomposition but dynamic load balancing which can also be implemented using rectilinear partitioning scheme. 3. Heart activity simulation The study various arrhythmias development mechanisms, methods of diagnostics, and preventing ways is extremely important because in economically developed countries cardio-vascular diseases are the world's

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

13

largest killer. Among them the largest killer is so-called sudden cardiac death. Research of heart arrhythmia is perform by medicals, biologists, physicists and experts in mathematical simulation. Heart is a dynamical system, and processes taking place can be described as an evolution of some state variables (electric membrane potential, ion channel conductivities, ion currents), so the mathematical model of heart can be obtained by analyzing the corresponding mathematical models. Cardiac tissues can be simplistically considered as a medium consisting of self-oscillatory and excitatory cells. Then the mathematical model of heart is a system of a massive number (up to 10 10) of ordinary differential equations, describing the electric activity of a heart. Adequate analysis also requires account of mechanic movement of the cardiac tissue resulting in large number of additional equations. Therefore computer simulation of heart using specified models is only possible on cluster systems. The research of heart function is entering a new stage due to the two factors: sufficiently realistic cardiac tissue models based on the newest electrophysiology data and access to high performance systems capable of efficient simulation of complicated spatio-temporal cardiac processes. There are several groups making research of heart function [8-16] and creating software for its simulation. In this paper we describe software Virtual Heart aimed at simulation of cardiac activity. This software can be used in research for large-scale simulation of spatio-temporal heart activity. It is capable of computing the current state of heart, its evolution, and integral characteristics such as virtual electrocardiogram. Another important aspect is a database containing virtual electrocardiograms, shots of spatio-temporal activity, temporal realizations of stresses at normal regime and various arrhythmias. This database can be used to help diagnostics of cardio-vascular conditions. We use multidomain heart model (Sachse F.B. et al. [17]) and highly realistic biological relevant models for heart segments (Hodgkin Huxley [18], Grandi Bers [9], Fink Noble [19], Courtemanche [20], Nygren [21], Stewart [22], Hoper [23]) and account for electromechanical interactions [16]. The input data is preprocessed MRT data. Based on that data we build grid and perform heart segmentation into ventricles, auricles, sinoatrial and atrioventricular nodes, bundle of His and Purkinje's fibers, fibroblasts, etc. Simulation consists of several consecutive stages, each using results of the previous one. We load the heart geometry as tetrahedral finite element mesh. After mesh normalization FEM assembler is initialized to assemble mass matrix and stiffness matrix represented in CSR sparse matrix format. Our software supports all models from open CellML storage [24] and has several preset models. The simulation can be done in sequential and parallel modes. Each iteration corresponds to ODE integration step using explicit Euler method with adaptive step size. Based on potential at mesh nodes we compute the Laplacian of the field, which is then used to compute external currents for cardiac hystiocytes at mesh nodes. The current values are used as right-hand side while solving the Poisson equation resulting in electric potential in intercellular space. Then we integrate differential equations describing electric physiology. These equations contain additional term accounting for intercellular interactions which depends on previously computed Laplacian and electric potential. The results are saved in vtk format supported by Paraview [25]. Fig 3. presents 2D simulation results on Nizhni Novgorod State University cluster. Left side shows initiation and propagation of electric (top) and mechanic (bottom) waves of activity. Electric activity results in a delay of mechanic activity which fits the experimental data. Right side shows close to linear scalability on cluster system for grid sizes from 90 x 90 to 600 x 600 (data for grid sizes are highlighted with color). Presented software is using state-of-the-art models based on biological relevant systems of ODEs and is capable of performing large-scale simulation of cardiac dynamics with high precision. High degree of heart segmentation allows for gaining reliable spatio-temporal effects, integral heart characteristics (e.g. virtual electrocardiogram), and data concerning influence of various electric, mechanic and drug-induced impacts. The multi domain model allows for efficient research of dynamic processes control using global and localized external electric stimulation, thus enabling the development of new ways to handle arrhythmias. The software allows to adjust models to fit specific personal parameters and test treatment methods.

14

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

Fig. 3. Results of simulation using 2D cardiac muscle model for research of electromechanical interactions

The software is cross-platform and uses Intel TBB [26] for shared memory systems and MPI for distributed memory systems. It is capable of using PETSc [27], Intel MKL PARDISO [28], and the solver developed in the University of Nizhni Novgorod [29] to solve sparse systems of linear equations. 4. Monte Carlo simulation of brain sensing by optical diffuse spectroscopy Modern advances in neurosciences require novel diagnostic techniques for non-invasive brain sensing. Traditional techniques such as magneto-resonant imaging (MRI) or electroencephalography (EEG) are expensive and their application is limited by high requirements to infrastructure. Recent studies propose application of optical diagnostics for non-invasive brain sensing [30]. Brain activity is accompanied by blood oxygenation variation in the active area of cortex which results in the change of its local optical properties. By illumination of certain cortex area with radiation at the wavelengths corresponding to a maximum difference in absorption of oxy- and deoxyhemoglobin one can effectively monitor oxygenation dynamics in the target area [31]. This technique is called optical diffuse spectroscopy (ODS). Stochastic character of multiply scattered photons propagation in human head does not allow an accurate prediction of probing intensity distribution by existing theoretical approaches. In this situation numerical methods of light propagation in media with complex geometry can provide a beneficial solution to predict probing light distribution and to determine the measurement volume from which the useful signal is detected. The existing numerical methods for calculation of light propagation in turbid media can be separated into two classes: finite elements methods and ray tracing techniques. Dehghani et al. have implemented finitedifference approach into software for diffuse optical imaging which resulted in development of widely used NIRFAST code [32]. An alternative tool is statistical Monte Carlo (MC) approach based on simulation of large number of random photon trajectories and further statistical analysis of the obtained results [33-35]. It has been generally considered that the Monte Carlo method is a gold standard to model light transport in multilayered heterogeneous tissues; however considerable computational load is a significant limitation for its wide use. E.g. simulation of 500 million photon trajectories in human head with complex geometry, obtained from MRI data, takes about 23 hours on a cluster (10 nodes with 2 Intel Xeon L5630 CPUs per node). At the same time, we need to make simulation of about 10 billion photon trajectories to obtain adequate statistics. In order to simulate the operation of the described above optical brain sensing we have to perform simulation of photon propagation in multilayered heterogeneous tissues with complex geometry mimicking human head [33]. We employ ray tracing technique based on the Monte Carlo method. For the problem of light propagation in turbid medium it is based on simulation of a large number of random photon trajectories while the shape of each trajectory depends on the optical properties and geometry of the medium. A number of programs to simulate light propagation in complex heterogeneous tissues have been developed by now. Classical MCML code is aimed for simulation of infinitely narrow beam propagation in a planar multi-

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

15

layered model medium [33 -beam and a voxelized model of a medium [34]. Li et al. described a Monte Carlo medium boundaries [36]. For accurate account of interaction of photon with inner boundary within medium it is necessary to use a triangle mesh based model of the inner boundaries. Our system for the simulation of light propagation in complex heterogeneous tissues [35 surfaces which are constructed by triangle meshes. For simulation of brain sensing technique we introduced a class of detectors: in our algorithm a detector is a closed region on an outermost surface (or surfaces); during the simulation we store information about the weight of photons that exit the tissue through each detector; we accumulate trajectories of photons so that for each detector we know the total weight and total trajectory of photons which passed through it; for storing photon trajectories data we use rectangular grid in the threedimensional space. The photon trajectory is calculated until it is within predefined area bounding the considered object. The easiest way of simulation of brain sensing by ODS is employment of planar geometry which can be performed using classical MCML code. More accurate simulation can be performed by using spherical head model. The most accurate simulation implies accounting for physiologic geometry (obtained from MRI diagnostic data). In our study we employed these three approaches in order to compare results and understand how important is to use complex head model in simulation. With the developed code we simulated the dependence of the fraction of probing intensity registered by a detector on source detector separation (Fig. 4a). From this figure one can see that different approaches provide different results and accounting for the real head model is beneficial. This dependence also allows one to determine possible source-detector distances at which the signal is strong enough to be detected.

Fig. 4. (a) Dependence of fraction of probing power registered by the detector on source-detector separation for probing wavelengths of 660 and 830 nm in optical diffuse spectroscopy of brain for planar, spherical and real geometry. (b) Speedup of the simulation code depends on number of cluster nodes for probing wavelengths of 660 and 830 nm.

Several benchmarks were performed to evaluate the performance of our code. Real human head geometry is used for constructing tissue layer boundaries (2 186 446 triangles). 16 million of photon trajectories are simulated. Cluster for testing consists of 128 nodes with 2 CPUs Intel Xeon X5570 (2.93GHz 4 cores) and 12 GB RAM per node. Benchmarks results are given in Fig. 4b, showing that employment of the distributed memory systems can significantly reduce computational time of our algorithm because the scalability is almost linear. Code parallelization efficiency is about 87% in average. In spite of efficient employment of computational clusters for the simulation the time required to obtain relevant results is still significant, so we are going to raise performance by improving the simulation algorithm and porting the code to GPU.

16

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

5. Molecular dynamics simulation. Interacting particles in 1D arrays A number of key biophysical transport processes (of charge in DNA or protons in photosynthetic proteins) reduce to wave propagation in effectively one-dimensional spatially inhomogeneous arrays. As it is wellknown, in case of non-interacting particles (or a single particle) inhomogeneity leads to Anderson localization of waves and halt of transport [37, 38]. The currently open and much studied problem is the dynamics of particles in the presence of interactions. However, the many-particle quantum problem is characterized by exceptional complexity, and existing work in this area bears more qualitative than strict mathematical character. In this context, particular attention is paid to the problem of two interacting particles, which is much more accessible for analytical and numerical studies [39-41]. While it would be challenging to prove delocalizing effect of interactions, the current evidence indicated only an increase in the volume of localization compared to single-particle states. We hypothesized that practical restrictions on the size of arrays that could be computationally studied obscure the true picture of the wave dynamics. We aimed at developing the computationally efficient parallel algorithms, suitable for high-performance heterogeneous computing and revisit the localization of two-particle states [42]. Hamilton operator of the considered model system has the form (1) . We used where b+, b are creation and annihilation operators, U force interactions, two approaches: analyze their own state of the lattice, solving the problem for the eigenvalues and eigenvectors of a sparse matrix (2) and conducting direct numerical integration of the discrete Schrodinger equation for two-particle amplitudes of the probability density (3) Dynamics. The use of parallel programming techniques enabled us to move forward in this issue. As the problem belongs to the class of SIMD, it is appropriate to use a graphics accelerator. An additional advantage of the GPU is that on one compute node, as a rule, there are some graphics cards that can be used in parallel. For the software technology we selected CUDA, and implemented a symplectic numerical integration scheme (the pq-method [42]). We arrived at the two-dimensional lattice problem on the data space as given by the twodimensional array of values for wave function amplitudes. To evaluate the function the integrator employs a kernel with two-dimensional grid of threads that processes all of the elements of the array. Remarkably, in CUDA kernel the local tunneling of particles between potential wells can be confined to a single global memory. As a result, the duration of a typical computational task with the same parameters as in the sequential version, on the same graphics card Tesla M2070 was reduced from one week to one day. Eigenvalue problem. The search for eigenvalues and eigenvectors was implemented by iterative algorithms. Primarily the matrix is converted to the three-diagonal form, which reduces the complexity of the algorithms. For this one employs fast Givens rotations, Householder reflections, Lanczos method. Then iterative methods or QR-decomposition are applied. To calculate the spectrum of the matrix of the system (2) and obtain its eigenvectors we used the library SLEPc [43] extension of PETSc library for sparse algebra.

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

17

We found that the interaction leads to a renormalization of the self-energies of two-particle states. Renormalization of the value is different for different classes of Fock states, which leads to their resonant interaction in a certain range of parameters U, and, eventually, to the delocalization of the eigenstates [42]. As mentioned above, the relative position of the delocalized states of the particles is such that the distance between them does not exceed the amount of the single-particle localization. This allowed us to characterize this state of the system as "correlated metal". Solving dynamical equations (3) one confirms unbounded spreading of the wave packet in the predicted parameter region and even obtains a detailed regime map in the parameter space, measuring the effective wave packet width (Fig. 5). The task of finding all the eigenvectors is computationally time consuming: using 100 nodes of Lomonosov cluster for N = 100 it takes 137 iterations and 20 hours and for N = 350 57 iterations and 70 hours. When using the current hardware environment to solve this problem for large matrix is difficult because of the large memory consumption. Further work to address the problem is to be reduced under consideration share the spectrum of the matrix. Computation of eigenvalues of the middle of the spectrum of the matrix produce, for example, uses the harmonic retrieval.

Fig. 5. Left: from localization to wave packet spreading for two interacting particles, =2.5, U=2 and U=4.5 respectively; colorbar corresponds to the wave packet local norm in log scale. Right: diagram of the transport regimes in the interaction-inhomogeneity parameter plane; colorbar corresponds to the effective wave packet width [42].

6. Conclusions PICADOR [3] our implementation of the Particle-in-Cell method for plasma simulation is based on MPI, OpenMP and OpenCL parallel programming technologies and is capable of utilizing computational resources of cluster systems with one or several CPUs and GPUs per node. It achieves 85% strong scaling efficiency and 90% weak scaling efficiency on 2048 CPU cores of a cluster system. The most time consuming mathematical kernel is the numerical integration of particles equations of motion. The key aspect of implementation for GPUs is the clustering of particles into supercells that allows efficient utilization of global and local memory. PICADOR scales well on GPUs: weak scaling efficiency is 64% on 512 NVIDIA Tesla X2070 GPUs relative to 16 GPUs. We implemented static rectilinear partitioning scheme for problems with nonuniform particle distribution. Our experiments with simulation of laser compression using 512 MPI processes shows speedup of 2.5 times compared to uniform decomposition scheme. We plan to improve load balance by implementing adaptive domain decomposition. The simulation of heart activity in 2D case can be parallelized via decomposition of simulation area resulting in close to linear scalability. For 3D simulation we use state-of-the-art models based on biological relevant systems of ODEs and take into account realistic heart geometry. The scheme is based on FEM computations and allows performing large-scale simulation of cardiac dynamics with high precision. The most

18

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

time consuming mathematical kernel is multiple invocations of sparse SLAE solver. Due to the fact that the right-hand side of SLAE on current step depends on the solution on previous step, the main candidate for parallelization is the SLAE solver itself. For stationary tasks the matrix of the SLAE does not change, so one could use a direct solver to improve performance. It would typically result in applying nested dissection algorithm for matrix reordering to minimize matrix fill-in, and matrix factorization algorithm once, and then calling backward Gauss solver on every step. For non-stationary tasks using a direct solver would result in multiple invocations of computationally intensive matrix factorization. In this case a promising way [44] is to use parallel iterative solver on cluster systems; it is a direction of the further research. Simulation of brain sensing by optical diffuse spectroscopy consists of two computationally intensive parts: simulation of light propagation and intersection search for photon trajectory and layer boundaries. Search of intersections takes about 80% of total computational time; therefore it is important to use appropriate structure for acceleration of the search. In our algorithm the BVH tree is employed. Simulation of light propagation in turbid medium is based on Monte Carlo method which allows to compute large number of random photon trajectories independently. Therefore one can obtain good scalability for this problem on both shared and distributed memory systems. The only limiting factor is the size of available memory space. Each photon has its own trajectory, so each thread requires a separate memory region. Modern CPUs can execute tens of threads in parallel, so hundreds of megabytes of memory are required. Such memory requirements are acceptable for CPU calculations, however for GPUs with thousands parallel threads the required memory size is larger and can reach several gigabytes. This aspect makes it difficult to use GPU for the considered task. One of the ways to reduce memory consumption is the usage of more economical data structures. Molecular dynamics simulation in given problem statement is also time consuming: it takes more than 70 hours using 100 nodes of a cluster. The most computationally intensive step is finding all eigenvectors of a sparse matrix of a special type. Factorization of the matrix requires storing of a dense rotation matrix of the same size. For significantly large systems this matrix does not fit memory of a single node, thus requiring distributed memory systems. Memory consumption can be reduced by adjusting parameters of the method. However, in this case the found eigenvectors may be not mutually orthogonal affecting result interpretation. We solve the task on cluster system and use SLEPc library. We consider two ways to improve performance: reducing the computed part of the spectrum and using parallel sparse SLAE solver with SLEPc library. We present future plans in Table 1.
Table 1. Further developments Direction Implementation for GPUs Load balancing on clusters Plasma simulation Done. Supercells method is used. Partially done. Static rectilinear decomposition is used. Adaptive decomposition scheme will be implemented. Heart activity simulation No plans. Brain sensing simulation Will be done by using BVH tree on GPUs. Not required for Monte Carlo simulation. Molecular dynamics simulation Dynamics simulation implemented for GPUs. No plans of porting solution of Eigenvalue problem to GPUs. Partially done by using scalable eigensolver. Could be improved by using relevant SLAE solver inside of the eigensolver.

Will be done by using FEM mesh decomposition and scalable SLAE solver.

Acknowledgements The work has been supported by the following grants: Scientific School #1960.2012.9, 14.B37.21.0393, 14.B37.21.0878, MK-1652.2012.2, MK-4028.2012.2, project #8741, RFBR 12-02-31403, Dynasty foundation, Government Contracts #11.519.11.2015 and #11.519.11.2022.

S. Bastrakov et al. / Procedia Computer Science 18 (2013) 10 19

19

References
[1] UNN HPC Competence Center. http://hpc-education.unn.ru/?id=992 [2] Birdsal C., Langdon A. Plasma Physics via Computer Simulation. Taylor & Francis Group, New York (2005). E., Malyshev A., Meyerov I., Surmin I. Particle-in-cell plasma simulation on [3] Bastrakov S., Donchenko R., Gonoskov A. heterogeneous cluster systems. Journal of Computational Science, 3 (2012), 474-479. [4] Taflove A. Computational Electrodynamics: The Finite-Difference Time-Domain Method. Artech House, London (1995). [5] Paul K. et al. Benchmarking the codes VORPAL, OSIRIS, and QuickPIC with Laser Wakefield Acceleration Simulations, AIP Conference Proceedings, Volume 1086, pp. 315-320 (2009) [6] MathGL library. http://mathgl.sourceforge.net [7] Nicol D.N. Rectilinear partitioning of irregular data parallel computations. J. Parallel Distr. Com., 23: 119-134, 1994. [8] Grandi E., Puglisi J.L., Wagner S., Maier L.S., Severi S., Bers D.M. Biophys J. 2007. V. 93(11). P. 3835. [9] Grandi E, Pasqualini FS, Bers DM. Journal of Molecular and Cellular Cardiology. 2010. V. 48. P. 112 121. [10] Livshitz L.M., Rudy Y. American Journal of Physiology: Heart and Circulatory Physiology. 2007. V. 292. P. H2854. [11] Moreno A.P., Abildskov J.A., Sachse F.B. Annals of Biomedical Engineering Ann Biomed Eng. 2008 . V. 36(1). P. 41. [12] Stewart P., Aslanidi O.V., Noble D., Noble P.J., Boyett M.R., Zhang H. Phil. Trans. of the Royal Society. 2009. V. 367. P. 2225. [13] Weiss J.N., Chen P.S., Qu Zh., Karagueuzian H.S., Garfinkel A. Circ.Res. 2000. V. 87. P. 1103. [14] Hooke N., Henriquez C.S., Lanzkron P., Rose D. Crit. Rev. Biomed. Eng. 1992. V. 120. P.127. [15] Trayanova N.A. Circ Res. 2011. V. 108. P.113-128. [16] Ten Tusscher K.H., Hren R., Panfilov A.V. Circ Res. 2007. V. 100. P. e87. [17] Sachse F. B., Moreno A. P., Seemann G., Abildskov J. A. Ann. Biomed. Eng. 2009. V. 37. P. 874. [18] Hodgkin A., Huxley A. J. Physiol. 1952. V. 117. P. 500. [19] Fink M., Noble D., Virag L., Varro A., Giles W.R. Progress in Biophysics and Molecular Biology. 2008. V. 96 (1-3). P. 357. [20] Courtemanche M., Ramirez R.J., Nattel S. American Journal of Physiology. 1998. V. 275. P. H301. PubMed ID: 9688927 [21] Nygren A., Fiset C., Firek L., Clark J.W., Lindblad D.S., Clark R.B., Giles W.R. Circulation Research. 1998. V. 82. P. 63. [22] Stewart P., Aslanidi O.V., Noble D., Noble P.J., Boyett M.R., Zhang H. Phil. Trans. of the Royal Society. 2009. V. 367. P. 2225. 2003. [23] Hoper C. Modellierung der Anatomie und Elektrophysiologie des menschlichen Vorhofs. Diploma Th. [24] CellML. http://models.cellml.org/electrophysiology [25] Paraview. http://www.paraview.org [26] Intel Threading Building Blocks. http://software.intel.com/intel-tbb [27] PETSc. http://www.mcs.anl.gov/petsc/index.html [28] Intel Math Kernel Library. http://software.intel.com/intel-mkl [29] Kozinov E., Lebedev I., Lebedev S., Malova A., Meerov I., Sysoev A., Filippenko S. A novel linear equation solver for sparse systems with a symmetric positive definite matrix. Vestnik of NNSU. 2012. Vol. 5(2) (to appear). [30] Hillman E.M.C. Optical brain imaging in vivo: techniques and applications from animal to man, J. Biomed. Opt. 12 (2007) 051402 [31] Zeff B.W., White B.R., Dehghani H., Schlaggar B.L., Culver J.P. Retinotopic mapping of adult human visual cortex with high-density diffuse optical tomography, Proc. Natl. Acad. Sci. U. S. A. 104 (2007) 12169-12174 [32] Dehghani H. et al. Near infrared optical tomography using NIRFAST: Algorithm for numerical model and image reconstruction, Commun. Numer. Methods Eng. 25 (2009) 711-732 [33] Wang L.V., Jacques S.L., Zheng L.Q. MCML Monte Carlo modeling of light transport in multi-layered tissues, Comput. Meth. Prog. Biol. 47 (1995) 131 146 [34] Boas D.A., Culver J., Stott J., Dunn A. Three dimensional Monte Carlo code for photon migration through complex heterogeneous media including the adult human head, Opt. Express 10 (2002) 159 170 [35] Gorshkov A.V., Kirillin M.Yu. Monte Carlo simulation of brain sensing by optical diffuse spectroscopy, Journal of Computational Science 3 (2012) 498 503 [36] Li H., Tian J., Zhu F., Cong W.X., Wang L.V., Hoffman E.A., Wang G. A mouse optical simulation environment (MOSE) to investigate bioluminescent phenomena in the living mouse with the Monte Carlo method, Acad. Radiol. 11 (2004) 1029 1038 [37] Anderson P.W. Phys. Rev.109, 1492 (1958). [38] Aubry S., Andre G., Proc. Isr. Phys. Soc., 3 133 (1980). [39] Shepelyansky D.L. Phys. Rev. Lett., 73, 2607 (1994). [40] Imry Y. Europhys. Lett., 30, 405 (1995). [41] Krimer D.O., Khomeriki R., Flach S. JETP Lett., 94, 406 (2011). [42] Flach S., Ivanchenko M.V., Khomeriki R. Europhys. Lett., 98, 66002 (2012). [43] SLEPSc. http://www.grycap.upv.es/slepc [44] Niederer S., Mitchell L., Smith N., Plank G. Simulating human cardiac electrophysiology on clinical time scales. Front. Physio. 2011.

You might also like