From diagham
Revision as of 02:43, 20 August 2016 by Regnault (talk | contribs) (→‎Common issues and limitations)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

DiagHam can rely on SCALAPACK to perform full diagonalization on a parallel machine (multi-core, cluster, ...). To enable the Scalapack support, you should configure DiagHam with the options --enable-mpi --enable-scalapack --enable-lapack. MPI is required to use Scalapack. You can specify the Scalapack library name or its path using .

Once DiagHam has been build with the Scalapack, programs that support it will display in the help the following line

   --use-scalapack : use SCALAPACK libraries instead of DiagHam or LAPACK libraries

The typical usage to run a code with Scalapack is given in the following example

mpirun --hostfile hostfile $PATHTODIAGHAM/build/FQHE/src/Programs/FQHETopInsulator/FQHECheckerboardLatticeModel -p 6 -x 6 -y 3 --single-band --memory 0 --flat-band --use-scalapack --only-ky 0 --only-kx 0 --mpi-smp cluster.desc --cluster-profil cluster.log --full-diag 5000

Either --mpi-smp or --mpi options have to be used in order to activate the MPI support. --use-scalapack is required to specify that Scalapack has to be used for full diagonalization.

Note that even if the Scalapack support is activated, some programs might not be able to use it. So if the option --use-scalapack does not appear when invoking the help, the program does not support it.

Common issues and limitations

When performing a full diagonalization using scalapack, it appears that some technical limitations might lead to crashes and other segfault. One common issue is related to MPI (or maybe only some implementation of MPI) and the maximum data that can be transfer. Without requiring the eigenstates, it seems that there is no real issue to go up to 65536x65356 real symmetric matrices or 32768x3768 complex hermitian matrix. If eigenstates are required, we should be sure that for each slot, the amount of memory needed for the eigenstate matrix does not exceed 2Gbytes. This can be achieved by using a large enough number of slots/nodes. Here are a few examples of parameters and benchmarks:

  • For real symmetric matrices, a dimension of 81828 with eigenstates has been reached (involving 32 MPI processes, 105h or real time, 8470h of user time on a 16 cores Xeon(R) CPU E5-2630 machine).
  • For complex hermitian matrices, a dimension of 75910 without eigenstates has been reached (involving 64 MPI processes over 3 nodes with 16 cores Xeon(R) CPU E5-2630 each) takes 11h.

A weird compilation bugs might happen when using iMPI and scalapack, such as

   libscalapack.a(pzhetd2.o): In function `pzhetd2_':
   pzhetd2.f:(.text+0x67e): undefined reference to `zhemv_'
   pzhetd2.f:(.text+0x835): undefined reference to `zher2_'
   pzhetd2.f:(.text+0xb99): undefined reference to `zhemv_'
   pzhetd2.f:(.text+0xd7e): undefined reference to `zher2_'

It seems that using --with-scalapack-libs="-lscalapack -llapack -lblas" when configuring (on top of CPPFLAGS="-DMPICH_IGNORE_CXX_SEEK" ../configure ...) solves the issue.