Tags

Tags give the ability to mark specific points in history as being important

1.0.1-kokkos

00eedac0 · Bump version 1.0.0 -> 1.0.1 · Nov 26, 2023

Version 1.0.0 (Kokkos implementation)

ADDED

- Correct Pepper release paper Inspire TexKey as used in the
  `pepper_citations.tex` file written out after a run
- Summary of the most important switches after the configuration
- New logo
- Detailed build instructions and bootstrap scripts; this was in particular
  lacking before for the Kokkos variant

FIXED

- Remove accidental writeout of a `vegas.txt` file into the working directory
- Fix typos in manpage and command line help output
- Add missing colour factors in the downloaded process data

CHANGED

- Simplify process specification (#34)
    - The `process_file` setting is deprecated. It can still be used, but is
      now equivalent to the process setting. This also means that process
      files can now also be specified on the cmd line using `-p|--process`.
    - It is still possible to directly specify the process filename, e.g.
      "z2j.csv". However, it is now also possible, and preferred, to give a
      specification that follows the OpenLoops conventions. For the above
      example, this would be "ppzjj", which is then parsed and transformed
      to the correct Pepper filename before trying to read the file.
    - There is no longer a default process.
- Change default of `phasespace.optimisation.n_steps` to 7 and
  `phasespace.integration.n_nonzero_min` to 45000
- Use CMake's FetchContent method instead of git submodules to fetch external
  codes during the configuration (#36). This allows us to generate working tarballs.

1.0.1-native

712fcd04 · Bump version 1.0.0 -> 1.0.1 · Nov 26, 2023

Version 1.0.0 (native implementation)

ADDED

- Correct Pepper release paper Inspire TexKey as used in the
  `pepper_citations.tex` file written out after a run
- Summary of the most important switches after the configuration
- New logo
- Detailed build instructions and bootstrap scripts; this was in particular
  lacking before for the Kokkos variant

FIXED

- Remove accidental writeout of a `vegas.txt` file into the working directory
- Fix typos in manpage and command line help output
- Add missing colour factors in the downloaded process data

CHANGED

- Simplify process specification (#34)
    - The `process_file` setting is deprecated. It can still be used, but is
      now equivalent to the process setting. This also means that process
      files can now also be specified on the cmd line using `-p|--process`.
    - It is still possible to directly specify the process filename, e.g.
      "z2j.csv". However, it is now also possible, and preferred, to give a
      specification that follows the OpenLoops conventions. For the above
      example, this would be "ppzjj", which is then parsed and transformed
      to the correct Pepper filename before trying to read the file.
    - There is no longer a default process.
- Change default of `phasespace.optimisation.n_steps` to 7 and
  `phasespace.integration.n_nonzero_min` to 45000
- Use CMake's FetchContent method instead of git submodules to fetch external
  codes during the configuration (#36). This allows us to generate working tarballs.

1.0.0-kokkos

c21d9774 · Merge branch 'main' into kokkos · Nov 13, 2023

Version 1.0.0 (Kokkos implementation)

This is the first public release of Pepper for the Kokkos variant
that targets CPU and GPU by different vendors using the Kokkos
portability framework.

1.0.0-native

bc41c1b0 · Merge branch 'dev' · Nov 13, 2023

Version 1.0.0 (native implementation)

This is the first public release of Pepper for the non-Kokkos variant
that targets CPU and Nvidia GPU (via CUDA) natively.

A description of changes with respect to 0.0.3 will be given in a later
commit.

0.0.3

b9e24c25 · Bump version 0.0.2 -> 0.0.3 · Sep 28, 2023

Version 0.0.3

This is the first version which is fully usable to generate complete
input for Sherpa and Pythia. It should therefore be close to a public
release now, barring of course some optimisation improvements and
bugfixes that might become necessary towards working to the release.

NEW FEATURES

- Add support for using host-only Chili when otherwise running on device
- Add beam energy setting
- Make HDF5 output compatible with the Sherpa reader
- Full PDF and beam information in HepMC3 output
- Cross section information in HDF5 output (in the "/generatedResult"
  field)
- Add support for LHEF event output
- Add leading colour configuration information to event output
- Add VEGAS-optimised t-channel (plus one s-channel) integrator
  "Chili(mild)" with CUDA support
- Add support for CUDA LHAPDF

MINOR IMPROVEMENTS

- Do MPI sum before updating helicity selection weights
- Enable Chili max eta cuts for jets
- Only stop optimisation/integration when all processes have been
  sampled at least once
- Add environment variable PEPPER_DEVICE_ID for setting the CUDA device
  used
- Scale min. number of nonzero events per optimisation/integration step
  with the number of helicity configuration and divide by the number of
  MPI ranks for easier usage
- Make CMake configuration output a bit more verbose
- Support nested internal timing diagnostics
- Add H_Tp^2/2 and H_T^2/2 scale definitions
- Improve output when zero events are requested

ADDED DOCUMENTATION

- Manual guide on reusing cached results
- Manual tutorial "Getting started"
- Manual reference on Runcard options

PERFORMANCE IMPROVEMENTS

- Add CPU vector instructions for some vertices
- Improve performance of resetting particle information in the recursion
- Remove unnecessary D->H copies of particle information
- Remove redundant helicity selection weight updates
- Make H->D copy of random numbers for FORCE_HOST_RNG=1 faster
- Evaluate Z/photon currents simultaneously
- Improve performance of the momentum storage handling in the Chili
  interface
- Use minimal storage for matrix elements
- Cache the non-zero helicity configuration, which speeds up
  initialisation of subsequent runs in particular on the device

BUGFIXES

- Fix read-in of selection weights
- Fix too low output precision of cached results
- Fix bug when returning a zero standard deviation/variance if the
  number of trials is one
- Fix non positive definite standard deviation (and hence selection
  weights)
- Use correct scale information in HDF5 output
- Improve heuristics to set beams in HDF5 output
- Increase FORM max. term number to fix colour factor generation bug
- Fix moving FORM-generated files to a across filesystem
- Fix crash in HDF5 output for weighted events
- Fill dummy cross sections for auxiliary weights to suppress new HepMC3
  warnings
- Fix integer overflow bug when MPI summing MC event counters for many
  ranks
- Remove dummy-event zero counting in HDF5 output, which broke the
  Sherpa readin
- Fix crash in HDF5 output when all events are zero
- Correct reported LHEH5 version string from 2.0.0 to 2.0.1
- Fix GPU device selection for MPI use
- Fix accidental correlation in the random flavour channel selection
- Fill correct scales and couplings in various output formats

no_process_sampling_during_optimisation

cd45df0d · Fix harmless compiler warning · Jun 09, 2023

A version where procs are looped during optimisation

... instead of sampled. They are however not summed, but just looped
over. In any case, it has turned out to be very slowly converging only,
because hardly contributing processes are evaluated on an equal footing
with any other process.

multiple_perform_kernels_and_vcl

94d8f7df · Add VCL_ENABLED switch to toggle VCL usage · Nov 12, 2022

A version with many recursion kernels and VCL support

This is very similar to the multiple_perform_kernels tag, but adds VCL
support on top. It should perform very well on CPU with vector
instructions, possibly better than later versions with a unified
recursion kernel.

0.0.2

62e6ad03 · Bump version 0.0.1 -> 0.0.2 · Mar 29, 2023

Version 0.0.2

This is the first version capable of dynamic scale setting, which should
allow us to generate practically useful results now.

Important bugfixes include the calculation of phi (which affected the
application of cuts and thus physics results), fixing the unweighting
for processes with more than process flavour group, and properly
enabling the cache of Chili results (which also includes a fix on the
Chili side).

0.0.1

0ce86a65 · Bump version 0.0.0 -> 0.0.1 · Mar 11, 2023

Version 0.0.1

This is the first version capable of generating physical (parton-level)
LHC events, with the last building block being the proper symmetrisation
of the initial state.

0.0.0

78a2e5c2 · Add first code and set up build system · Mar 18, 2022

Version 0.0.0

This pre-release tag points to the very first commit in the Pepper
repository. At that point, the working title of the project was still
BlockGen 2.

no_helicity_blocks

af6ee081 · Fix gcc compiler warnings · Oct 10, 2022

A version where an entire event block has the same sampled helicity

Later, we switched to introducing smaller helicity blocks
(helicity_block_size <= block_size), e.g. of 32 events, that share a
single helicity. This gives better helicity sampling statistics, and is
still keeping lock-step for CUDA warps (of 32 threads).

Note that helicity summing is always the same and is not affected.

multiple_perform_kernels

566e0397 · Add evt/second metric in final printout · Nov 03, 2022

A version where the BG recursion still uses many kernels

... e.g. for each vertex and propagator, instead of merging them into a
single "perform" kernel, which is more appropriate for CUDA (because
starting too many small kernels gives too much overhead). However, for
non-CUDA performance on the CPU, this versions with small kernels performs
better, presumably because looping over events with small kernels gives
better caching efficiency there.