QCDMPI

PURE QCD MONTE CARLO SIMULATION CODE with MPI

version 1.5 (1998/7/9)
Shinji Hioki
Department of Physics, Tezukayama University,
Tezukayama 7-1-1, Nara 631, JAPAN
(hioki@tezukayama-u.ac.jp)

Copyright (c) 1996, Shinji Hioki
All rights reserved.


SOURCES

BENCHMARK

QCDimMPI

--- QCDMPI for improved action version ---


***********************************************************************
       This package is free software and you can use this under:
        "the GNU General Public Lisense" published by the FSF.
***********************************************************************

                                README

                           VERSION HISTORY

     ver. 0.9, 1994/11/12 - original

       o for a parallel machines(with Intel NX calls: eg.Paragon)
       o independent of the number of processors(>1)
       o independent of the way of division of the lattice
       o independent of the dimension of the system (N-dim QCD)
       o parallel random number generator
       o small working area

     ver. 1.0, 1996/01/17 - modified for a single machine

       o for also a single machine without commnication

     ver. 1.1 1996/08/17 - modified to use MPI calls (QCDMPI)

       o for every platform that supports MPI

     ver. 1.2, 1996/08/23 - modified to use preprocessor

     ver. 1.3, 1996/08/24 - delete Intel NX calls

       o because Paragon compiler supports MPI calls

     ver. 1.4, 1997/12/02 - use MPI_SENDRECV_REPLACE in comm.F

     ver. 1.5, 1998/07/09 - minar change

        




                             REFERENCES

        o S.Hioki, Nucl.Phys.B(Proc.Suppl.)42(1995)870-872.
        o S.Hioki, Parallel Computing, 22-10(1996)1335-1344.
        o S.Hioki and A.Nakamura, Proceedings of PCW96, Postscript file.


                              CONTENTS

              1) FILES INCLUDED IN THE DISTRIBUTION
              2) HOW TO SET PARAMETERS AND MACHINES
               2-0) You can run on a Single Machine, immediately !!!
               2-1) Parameters in params
               2-2) Parameter Examples:
               2-3) Machines and Options in makefile
              3) EXECUTION EXAMPLES
               3-1) Sun Sparc WS without communication
                    (or PC running LINUX with GNU compiler)
               3-2) Intel Paragon with 8 nodes
               3-3) DEC Alpha 21164, MPICH 
                    with 8 ranks on a PC (with 10Base-T network)
               3-4) Intel Pentium 75MHz on LINUX, MPICH and g77 (ver.0.5.16)
                    with 8 ranks on a PC (with 10Base-T network)


                          BUGS AND COMMENTS

           o in the random number generator routines, 
             4-byte integer is assumed.

           o feel free to send bugs and comments to the author:
             Shinji Hioki (hioki@tezukayama-u.ac.jp).

           o for up-to-date information see URL:
             http://insam.sci.hiroshima-u.ac.jp/QCDMPI/

*************************************
1) FILES INCLUDED IN THE DISTRIBUTION
*************************************

-------------------------------------------------------------
  NAME		ROLE			COMMENTS
-------------------------------------------------------------
o machine dependent files

  makefile	makefile		compiler,options,machine dependent
  params	parameters		lattice size, number of processors
					lattice division

  - dependent but automatically set by preprocessor (do not change)

  qcd.f		main qcd program	
  comm.f	communication routines	

o machine independent files

  staple.f	staple construction in Wilson Action
  lib_staple.f	libraries for staple.f
  phb.f		update (pseudo heat bath) routines
  lib_phb.f	libraries for phb.f
  su3.f		libraries for su(3) matrix
  ran.f		parallel random number generator	4-byte integer assumed

o document

  README	this file

-------------------------------------------------------------

*************************************
2) HOW TO SET PARAMETERS AND MACHINES
*************************************

2-0) First you can run on a Single Machine (If you want.)

 Withour any modifications and new settings, you can run this module as;

 1) make
 2) ./qcd
 3) compare output with 3-1) below

 If there are any errors, read the followings !!!

2-1) Parameters in params

ndim             : dimension of the system (=N below, ndim=4 default)

ng1, ng2,.. ,ngN : (global) lattice size in each direction 
                   (ngN: even number)

np1, np2,.. ,npN : number of processors in each direction
                   (if npM=1, direction M does NOT divided into processors)

n1, n2,.. ,nN    : (local) lattice size in each direction on a processor

np(ndim)         : array for np1, np2,.. ,npN
ns(ndim)         : array for n1, n2,.. ,nN

o INPUT 

beta             : SU(3)coupling parameter g (beta=6/g**2)
nsweep           : number of iterations of updating
n00              : number which distinguish random numbers
                   (MUST change job by job)

o OUTPUT

plaq             : plaquette value
link update time : elapsed time to update one link
comm bandwidth   : includes soft- & hardware latency & genuine bandwidth

2-2) Parameter Examples:

o 4-dim QCD of size 16**3*32 on 4**4 processors
     parameter(ndim=4)
     parameter(ng1=16,ng2=16,ng3=16,ng4=32)
     parameter(nc1=4,nc2=4,nc3=4,nc4=4)
     parameter(nbmax=max(nv/n1,nv/n2,nv/n3,nv/n4)/2)

o 3-dim QCD of size 16**3 on 4*2 processors
     parameter(ndim=3)
     parameter(ng1=16,ng2=16,ng3=16)
     parameter(nc1=4,nc2=2,nc3=1)
     parameter(nbmax=max(nv/n1,nv/n2)/2)
c    np(4)=np4 ! comment out or delete this line
c    ns(4)=n4  ! comment out or delete this line

o 4-dim QCD of size 8*12*16*24 on a single processor
     parameter(ndim=4)
     parameter(ng1=8,ng2=12,ng3=16,ng4=24)
     parameter(nc1=1,nc2=1,nc3=1,nc4=1)
     parameter(nbmax=1) !or any positive integer

2-3) Machines and Options in makefile

     you MUST set F77, F77FLAGS and ARC variables
     uncomment according to your machine and options
 
eg.) on MPI machines, uncomment MPI part and coment other parts:

### MPI (eg. MPICH)
     F77            =       mpif77
     F77FLAGS       =       -O
     ARC            =       MPI


*********************
3) EXECUTION EXAMPLES
*********************

3-1)Sun Sparc WS with Sparc Compiler
    (or PC running LINUX with GNU compiler)

 in makefile
     F77	=	f77 (or g77 on LINUX)
     F77FLAGS	=       -O
     MACHINE	=	Single
 in params
     parameter(ndim=4)
     parameter(ng1=8,ng2=8,ng3=8,ng4=8)
     parameter(nc1=1,nc2=1,nc3=1,nc4=1)
     parameter(nbmax=1)

  sun%./qcd
 Input Beta=
6.0
 Input Number of Sweeps=
4 
 Input Random Number Key=
0
sweep, plaq, t_total, t_comm     1 0.771459    1.000    0.000
sweep, plaq, t_total, t_comm     2 0.646785    1.000    0.000
sweep, plaq, t_total, t_comm     3 0.616092    1.000    0.000
sweep, plaq, t_total, t_comm     4 0.608282    1.000    0.000

***** QCD PERFORMANCE (from last sweep)********
 link update time =    61.035156 micro sec/link
 comm bandwidth   =     0.000000 Mega Byte/sec
***********************************************

3-2)Intel Paragon with 8 nodes (with MPI calls)

 in makefile
     F77	    =       if77
     F77FLAGS       =       -O4 -nx -Knoieee -Mnodepchk -Mnoframe -lmpi
     ARC            =       MPI
 in params
     parameter(ndim=4)
     parameter(ng1=16,ng2=16,ng3=16,ng4=16)
     parameter(nc1=2,nc2=2,nc3=2,nc4=1)
     parameter(nbmax=max(nv/n1,nv/n2,nv/n3)/2)

 paragon%qcd -sz 8
 Input Beta=
6.0
 Input Number of Sweeps=
4
 Input Random Number Key=
0
sweep, plaq, t_total, t_comm     1 0.772416   15.523    0.259
sweep, plaq, t_total, t_comm     2 0.646920   13.545    0.126
sweep, plaq, t_total, t_comm     3 0.618343   13.550    0.108
sweep, plaq, t_total, t_comm     4 0.608189   13.561    0.118

***** QCD PERFORMANCE (from last sweep)********
 link update time =    51.731743 micro sec/link
 comm bandwidth   =    44.946556 Mega Byte/sec
***********************************************

3-3)DEC Alpha 21164, MPICH with 8 ranks on a WS (with 10Base-T network)

 in makefile
     F77            =       f77
     F77FLAGS       =       -L./mpich/lib/alpha/ch_p4 -lmpi
     ARC            =       MPI

 in params
     parameter(ndim=4)
     parameter(ng1=16,ng2=16,ng3=16,ng4=16)
     parameter(nc1=2,nc2=2,nc3=2,nc4=1)
     parameter(nbmax=max(nv/n1,nv/n2,nv/n3)/2)

 alpha%mpirun -np 8 qcd
 Input Beta=
6.0
 Input Number of Sweeps=
4
 Input Random Number Key=
0
sweep, plaq, t_total, t_comm     1 0.772416   17.875    6.211
sweep, plaq, t_total, t_comm     2 0.646920   17.469    5.559
sweep, plaq, t_total, t_comm     3 0.618343   17.590    6.145
sweep, plaq, t_total, t_comm     4 0.608189   17.629    6.082

***** QCD PERFORMANCE (from last sweep)********
 link update time =    67.248940 micro sec/link
 comm bandwidth   =     0.872803 Mega Byte/sec
***********************************************

3-4) Intel Pentium 75MHz on LINUX, MPICH and g77 (ver.0.5.16)
     with 8 ranks on a PC (with 10Base-T network)

 in makefile
     F77            =       g77
     F77FLAGS       =       -L./mpich/lib/LINUX/ch_p4 -lmpi
     ARC            =       MPI

 in params
     parameter(ndim=4)
     parameter(ng1=8,ng2=8,ng3=8,ng4=8)
     parameter(nc1=2,nc2=2,nc3=2,nc4=1)
     parameter(nbmax=(nv/n1)/2)
                     ^^^^^^^^^ 
                     should be "max(nv/n1,nv/n2,nv/n3)/2)" but g77 does NOT
                     support this expression, numerically equivalent for
                     above parameters

 alpha%mpirun -np 8 qcd
 Input Beta=
6.0
 Input Number of Sweeps=
4
 Input Random Number Key=
0
sweep, plaq, t_total, t_comm     1 0.772989   63.483   62.885
sweep, plaq, t_total, t_comm     2 0.646476   60.332   59.075
sweep, plaq, t_total, t_comm     3 0.618864   55.389   54.281
sweep, plaq, t_total, t_comm     4 0.609579   38.552   37.956

***** QCD PERFORMANCE (from last sweep data)***
 link update time =  2353.048096 micro sec/link
 comm bandwidth   =     0.017482 Mega Byte/sec 
***********************************************


END of README ********************************************


Shinji Hioki (hioki@tezukayama-u.ac.jp)