Tags give the ability to mark specific points in history as being important
-
1.0.1-kokkos
00eedac0 · ·Version 1.0.0 (Kokkos implementation) ADDED - Correct Pepper release paper Inspire TexKey as used in the `pepper_citations.tex` file written out after a run - Summary of the most important switches after the configuration - New logo - Detailed build instructions and bootstrap scripts; this was in particular lacking before for the Kokkos variant FIXED - Remove accidental writeout of a `vegas.txt` file into the working directory - Fix typos in manpage and command line help output - Add missing colour factors in the downloaded process data CHANGED - Simplify process specification (#34) - The `process_file` setting is deprecated. It can still be used, but is now equivalent to the process setting. This also means that process files can now also be specified on the cmd line using `-p|--process`. - It is still possible to directly specify the process filename, e.g. "z2j.csv". However, it is now also possible, and preferred, to give a specification that follows the OpenLoops conventions. For the above example, this would be "ppzjj", which is then parsed and transformed to the correct Pepper filename before trying to read the file. - There is no longer a default process. - Change default of `phasespace.optimisation.n_steps` to 7 and `phasespace.integration.n_nonzero_min` to 45000 - Use CMake's FetchContent method instead of git submodules to fetch external codes during the configuration (#36). This allows us to generate working tarballs. -
1.0.1-native
712fcd04 · ·Version 1.0.0 (native implementation) ADDED - Correct Pepper release paper Inspire TexKey as used in the `pepper_citations.tex` file written out after a run - Summary of the most important switches after the configuration - New logo - Detailed build instructions and bootstrap scripts; this was in particular lacking before for the Kokkos variant FIXED - Remove accidental writeout of a `vegas.txt` file into the working directory - Fix typos in manpage and command line help output - Add missing colour factors in the downloaded process data CHANGED - Simplify process specification (#34) - The `process_file` setting is deprecated. It can still be used, but is now equivalent to the process setting. This also means that process files can now also be specified on the cmd line using `-p|--process`. - It is still possible to directly specify the process filename, e.g. "z2j.csv". However, it is now also possible, and preferred, to give a specification that follows the OpenLoops conventions. For the above example, this would be "ppzjj", which is then parsed and transformed to the correct Pepper filename before trying to read the file. - There is no longer a default process. - Change default of `phasespace.optimisation.n_steps` to 7 and `phasespace.integration.n_nonzero_min` to 45000 - Use CMake's FetchContent method instead of git submodules to fetch external codes during the configuration (#36). This allows us to generate working tarballs. -
1.0.0-kokkos
c21d9774 · ·Version 1.0.0 (Kokkos implementation) This is the first public release of Pepper for the Kokkos variant that targets CPU and GPU by different vendors using the Kokkos portability framework.
-
1.0.0-native
bc41c1b0 · ·Version 1.0.0 (native implementation) This is the first public release of Pepper for the non-Kokkos variant that targets CPU and Nvidia GPU (via CUDA) natively. A description of changes with respect to 0.0.3 will be given in a later commit.
-
0.0.3
b9e24c25 · ·Version 0.0.3 This is the first version which is fully usable to generate complete input for Sherpa and Pythia. It should therefore be close to a public release now, barring of course some optimisation improvements and bugfixes that might become necessary towards working to the release. NEW FEATURES - Add support for using host-only Chili when otherwise running on device - Add beam energy setting - Make HDF5 output compatible with the Sherpa reader - Full PDF and beam information in HepMC3 output - Cross section information in HDF5 output (in the "/generatedResult" field) - Add support for LHEF event output - Add leading colour configuration information to event output - Add VEGAS-optimised t-channel (plus one s-channel) integrator "Chili(mild)" with CUDA support - Add support for CUDA LHAPDF MINOR IMPROVEMENTS - Do MPI sum before updating helicity selection weights - Enable Chili max eta cuts for jets - Only stop optimisation/integration when all processes have been sampled at least once - Add environment variable PEPPER_DEVICE_ID for setting the CUDA device used - Scale min. number of nonzero events per optimisation/integration step with the number of helicity configuration and divide by the number of MPI ranks for easier usage - Make CMake configuration output a bit more verbose - Support nested internal timing diagnostics - Add H_Tp^2/2 and H_T^2/2 scale definitions - Improve output when zero events are requested ADDED DOCUMENTATION - Manual guide on reusing cached results - Manual tutorial "Getting started" - Manual reference on Runcard options PERFORMANCE IMPROVEMENTS - Add CPU vector instructions for some vertices - Improve performance of resetting particle information in the recursion - Remove unnecessary D->H copies of particle information - Remove redundant helicity selection weight updates - Make H->D copy of random numbers for FORCE_HOST_RNG=1 faster - Evaluate Z/photon currents simultaneously - Improve performance of the momentum storage handling in the Chili interface - Use minimal storage for matrix elements - Cache the non-zero helicity configuration, which speeds up initialisation of subsequent runs in particular on the device BUGFIXES - Fix read-in of selection weights - Fix too low output precision of cached results - Fix bug when returning a zero standard deviation/variance if the number of trials is one - Fix non positive definite standard deviation (and hence selection weights) - Use correct scale information in HDF5 output - Improve heuristics to set beams in HDF5 output - Increase FORM max. term number to fix colour factor generation bug - Fix moving FORM-generated files to a across filesystem - Fix crash in HDF5 output for weighted events - Fill dummy cross sections for auxiliary weights to suppress new HepMC3 warnings - Fix integer overflow bug when MPI summing MC event counters for many ranks - Remove dummy-event zero counting in HDF5 output, which broke the Sherpa readin - Fix crash in HDF5 output when all events are zero - Correct reported LHEH5 version string from 2.0.0 to 2.0.1 - Fix GPU device selection for MPI use - Fix accidental correlation in the random flavour channel selection - Fill correct scales and couplings in various output formats
-
no_process_sampling_during_optimisation
cd45df0d · ·A version where procs are looped during optimisation ... instead of sampled. They are however not summed, but just looped over. In any case, it has turned out to be very slowly converging only, because hardly contributing processes are evaluated on an equal footing with any other process.
-
multiple_perform_kernels_and_vcl
94d8f7df · ·A version with many recursion kernels and VCL support This is very similar to the multiple_perform_kernels tag, but adds VCL support on top. It should perform very well on CPU with vector instructions, possibly better than later versions with a unified recursion kernel.
-
0.0.2
62e6ad03 · ·Version 0.0.2 This is the first version capable of dynamic scale setting, which should allow us to generate practically useful results now. Important bugfixes include the calculation of phi (which affected the application of cuts and thus physics results), fixing the unweighting for processes with more than process flavour group, and properly enabling the cache of Chili results (which also includes a fix on the Chili side).
-
0.0.1
0ce86a65 · ·Version 0.0.1 This is the first version capable of generating physical (parton-level) LHC events, with the last building block being the proper symmetrisation of the initial state.
-
0.0.0
78a2e5c2 · ·Version 0.0.0 This pre-release tag points to the very first commit in the Pepper repository. At that point, the working title of the project was still BlockGen 2.
-
no_helicity_blocks
af6ee081 · ·A version where an entire event block has the same sampled helicity Later, we switched to introducing smaller helicity blocks (helicity_block_size <= block_size), e.g. of 32 events, that share a single helicity. This gives better helicity sampling statistics, and is still keeping lock-step for CUDA warps (of 32 threads). Note that helicity summing is always the same and is not affected.
-
multiple_perform_kernels
566e0397 · ·A version where the BG recursion still uses many kernels ... e.g. for each vertex and propagator, instead of merging them into a single "perform" kernel, which is more appropriate for CUDA (because starting too many small kernels gives too much overhead). However, for non-CUDA performance on the CPU, this versions with small kernels performs better, presumably because looping over events with small kernels gives better caching efficiency there.