version 1.5 (1998/7/9)
Shinji Hioki
Department of Physics, Tezukayama University,
Tezukayama 7-1-1, Nara 631, JAPAN
(hioki@tezukayama-u.ac.jp)
Copyright (c) 1996, Shinji Hioki
All rights reserved.
***********************************************************************
This package is free software and you can use this under:
"the GNU General Public Lisense" published by the FSF.
***********************************************************************
README
VERSION HISTORY
ver. 0.9, 1994/11/12 - original
o for a parallel machines(with Intel NX calls: eg.Paragon)
o independent of the number of processors(>1)
o independent of the way of division of the lattice
o independent of the dimension of the system (N-dim QCD)
o parallel random number generator
o small working area
ver. 1.0, 1996/01/17 - modified for a single machine
o for also a single machine without commnication
ver. 1.1 1996/08/17 - modified to use MPI calls (QCDMPI)
o for every platform that supports MPI
ver. 1.2, 1996/08/23 - modified to use preprocessor
ver. 1.3, 1996/08/24 - delete Intel NX calls
o because Paragon compiler supports MPI calls
ver. 1.4, 1997/12/02 - use MPI_SENDRECV_REPLACE in comm.F
ver. 1.5, 1998/07/09 - minar change
REFERENCES
o S.Hioki, Nucl.Phys.B(Proc.Suppl.)42(1995)870-872.
o S.Hioki, Parallel Computing, 22-10(1996)1335-1344.
o S.Hioki and A.Nakamura, Proceedings of PCW96, Postscript file.
CONTENTS
1) FILES INCLUDED IN THE DISTRIBUTION
2) HOW TO SET PARAMETERS AND MACHINES
2-0) You can run on a Single Machine, immediately !!!
2-1) Parameters in params
2-2) Parameter Examples:
2-3) Machines and Options in makefile
3) EXECUTION EXAMPLES
3-1) Sun Sparc WS without communication
(or PC running LINUX with GNU compiler)
3-2) Intel Paragon with 8 nodes
3-3) DEC Alpha 21164, MPICH
with 8 ranks on a PC (with 10Base-T network)
3-4) Intel Pentium 75MHz on LINUX, MPICH and g77 (ver.0.5.16)
with 8 ranks on a PC (with 10Base-T network)
BUGS AND COMMENTS
o in the random number generator routines,
4-byte integer is assumed.
o feel free to send bugs and comments to the author:
Shinji Hioki (hioki@tezukayama-u.ac.jp).
o for up-to-date information see URL:
http://insam.sci.hiroshima-u.ac.jp/QCDMPI/
*************************************
1) FILES INCLUDED IN THE DISTRIBUTION
*************************************
-------------------------------------------------------------
NAME ROLE COMMENTS
-------------------------------------------------------------
o machine dependent files
makefile makefile compiler,options,machine dependent
params parameters lattice size, number of processors
lattice division
- dependent but automatically set by preprocessor (do not change)
qcd.f main qcd program
comm.f communication routines
o machine independent files
staple.f staple construction in Wilson Action
lib_staple.f libraries for staple.f
phb.f update (pseudo heat bath) routines
lib_phb.f libraries for phb.f
su3.f libraries for su(3) matrix
ran.f parallel random number generator 4-byte integer assumed
o document
README this file
-------------------------------------------------------------
*************************************
2) HOW TO SET PARAMETERS AND MACHINES
*************************************
2-0) First you can run on a Single Machine (If you want.)
Withour any modifications and new settings, you can run this module as;
1) make
2) ./qcd
3) compare output with 3-1) below
If there are any errors, read the followings !!!
2-1) Parameters in params
ndim : dimension of the system (=N below, ndim=4 default)
ng1, ng2,.. ,ngN : (global) lattice size in each direction
(ngN: even number)
np1, np2,.. ,npN : number of processors in each direction
(if npM=1, direction M does NOT divided into processors)
n1, n2,.. ,nN : (local) lattice size in each direction on a processor
np(ndim) : array for np1, np2,.. ,npN
ns(ndim) : array for n1, n2,.. ,nN
o INPUT
beta : SU(3)coupling parameter g (beta=6/g**2)
nsweep : number of iterations of updating
n00 : number which distinguish random numbers
(MUST change job by job)
o OUTPUT
plaq : plaquette value
link update time : elapsed time to update one link
comm bandwidth : includes soft- & hardware latency & genuine bandwidth
2-2) Parameter Examples:
o 4-dim QCD of size 16**3*32 on 4**4 processors
parameter(ndim=4)
parameter(ng1=16,ng2=16,ng3=16,ng4=32)
parameter(nc1=4,nc2=4,nc3=4,nc4=4)
parameter(nbmax=max(nv/n1,nv/n2,nv/n3,nv/n4)/2)
o 3-dim QCD of size 16**3 on 4*2 processors
parameter(ndim=3)
parameter(ng1=16,ng2=16,ng3=16)
parameter(nc1=4,nc2=2,nc3=1)
parameter(nbmax=max(nv/n1,nv/n2)/2)
c np(4)=np4 ! comment out or delete this line
c ns(4)=n4 ! comment out or delete this line
o 4-dim QCD of size 8*12*16*24 on a single processor
parameter(ndim=4)
parameter(ng1=8,ng2=12,ng3=16,ng4=24)
parameter(nc1=1,nc2=1,nc3=1,nc4=1)
parameter(nbmax=1) !or any positive integer
2-3) Machines and Options in makefile
you MUST set F77, F77FLAGS and ARC variables
uncomment according to your machine and options
eg.) on MPI machines, uncomment MPI part and coment other parts:
### MPI (eg. MPICH)
F77 = mpif77
F77FLAGS = -O
ARC = MPI
*********************
3) EXECUTION EXAMPLES
*********************
3-1)Sun Sparc WS with Sparc Compiler
(or PC running LINUX with GNU compiler)
in makefile
F77 = f77 (or g77 on LINUX)
F77FLAGS = -O
MACHINE = Single
in params
parameter(ndim=4)
parameter(ng1=8,ng2=8,ng3=8,ng4=8)
parameter(nc1=1,nc2=1,nc3=1,nc4=1)
parameter(nbmax=1)
sun%./qcd
Input Beta=
6.0
Input Number of Sweeps=
4
Input Random Number Key=
0
sweep, plaq, t_total, t_comm 1 0.771459 1.000 0.000
sweep, plaq, t_total, t_comm 2 0.646785 1.000 0.000
sweep, plaq, t_total, t_comm 3 0.616092 1.000 0.000
sweep, plaq, t_total, t_comm 4 0.608282 1.000 0.000
***** QCD PERFORMANCE (from last sweep)********
link update time = 61.035156 micro sec/link
comm bandwidth = 0.000000 Mega Byte/sec
***********************************************
3-2)Intel Paragon with 8 nodes (with MPI calls)
in makefile
F77 = if77
F77FLAGS = -O4 -nx -Knoieee -Mnodepchk -Mnoframe -lmpi
ARC = MPI
in params
parameter(ndim=4)
parameter(ng1=16,ng2=16,ng3=16,ng4=16)
parameter(nc1=2,nc2=2,nc3=2,nc4=1)
parameter(nbmax=max(nv/n1,nv/n2,nv/n3)/2)
paragon%qcd -sz 8
Input Beta=
6.0
Input Number of Sweeps=
4
Input Random Number Key=
0
sweep, plaq, t_total, t_comm 1 0.772416 15.523 0.259
sweep, plaq, t_total, t_comm 2 0.646920 13.545 0.126
sweep, plaq, t_total, t_comm 3 0.618343 13.550 0.108
sweep, plaq, t_total, t_comm 4 0.608189 13.561 0.118
***** QCD PERFORMANCE (from last sweep)********
link update time = 51.731743 micro sec/link
comm bandwidth = 44.946556 Mega Byte/sec
***********************************************
3-3)DEC Alpha 21164, MPICH with 8 ranks on a WS (with 10Base-T network)
in makefile
F77 = f77
F77FLAGS = -L./mpich/lib/alpha/ch_p4 -lmpi
ARC = MPI
in params
parameter(ndim=4)
parameter(ng1=16,ng2=16,ng3=16,ng4=16)
parameter(nc1=2,nc2=2,nc3=2,nc4=1)
parameter(nbmax=max(nv/n1,nv/n2,nv/n3)/2)
alpha%mpirun -np 8 qcd
Input Beta=
6.0
Input Number of Sweeps=
4
Input Random Number Key=
0
sweep, plaq, t_total, t_comm 1 0.772416 17.875 6.211
sweep, plaq, t_total, t_comm 2 0.646920 17.469 5.559
sweep, plaq, t_total, t_comm 3 0.618343 17.590 6.145
sweep, plaq, t_total, t_comm 4 0.608189 17.629 6.082
***** QCD PERFORMANCE (from last sweep)********
link update time = 67.248940 micro sec/link
comm bandwidth = 0.872803 Mega Byte/sec
***********************************************
3-4) Intel Pentium 75MHz on LINUX, MPICH and g77 (ver.0.5.16)
with 8 ranks on a PC (with 10Base-T network)
in makefile
F77 = g77
F77FLAGS = -L./mpich/lib/LINUX/ch_p4 -lmpi
ARC = MPI
in params
parameter(ndim=4)
parameter(ng1=8,ng2=8,ng3=8,ng4=8)
parameter(nc1=2,nc2=2,nc3=2,nc4=1)
parameter(nbmax=(nv/n1)/2)
^^^^^^^^^
should be "max(nv/n1,nv/n2,nv/n3)/2)" but g77 does NOT
support this expression, numerically equivalent for
above parameters
alpha%mpirun -np 8 qcd
Input Beta=
6.0
Input Number of Sweeps=
4
Input Random Number Key=
0
sweep, plaq, t_total, t_comm 1 0.772989 63.483 62.885
sweep, plaq, t_total, t_comm 2 0.646476 60.332 59.075
sweep, plaq, t_total, t_comm 3 0.618864 55.389 54.281
sweep, plaq, t_total, t_comm 4 0.609579 38.552 37.956
***** QCD PERFORMANCE (from last sweep data)***
link update time = 2353.048096 micro sec/link
comm bandwidth = 0.017482 Mega Byte/sec
***********************************************
END of README ********************************************