Version 0.0.3 This is the first version which is fully usable to generate complete input for Sherpa and Pythia. It should therefore be close to a public release now, barring of course some optimisation improvements and bugfixes that might become necessary towards working to the release. NEW FEATURES - Add support for using host-only Chili when otherwise running on device - Add beam energy setting - Make HDF5 output compatible with the Sherpa reader - Full PDF and beam information in HepMC3 output - Cross section information in HDF5 output (in the "/generatedResult" field) - Add support for LHEF event output - Add leading colour configuration information to event output - Add VEGAS-optimised t-channel (plus one s-channel) integrator "Chili(mild)" with CUDA support - Add support for CUDA LHAPDF MINOR IMPROVEMENTS - Do MPI sum before updating helicity selection weights - Enable Chili max eta cuts for jets - Only stop optimisation/integration when all processes have been sampled at least once - Add environment variable PEPPER_DEVICE_ID for setting the CUDA device used - Scale min. number of nonzero events per optimisation/integration step with the number of helicity configuration and divide by the number of MPI ranks for easier usage - Make CMake configuration output a bit more verbose - Support nested internal timing diagnostics - Add H_Tp^2/2 and H_T^2/2 scale definitions - Improve output when zero events are requested ADDED DOCUMENTATION - Manual guide on reusing cached results - Manual tutorial "Getting started" - Manual reference on Runcard options PERFORMANCE IMPROVEMENTS - Add CPU vector instructions for some vertices - Improve performance of resetting particle information in the recursion - Remove unnecessary D->H copies of particle information - Remove redundant helicity selection weight updates - Make H->D copy of random numbers for FORCE_HOST_RNG=1 faster - Evaluate Z/photon currents simultaneously - Improve performance of the momentum storage handling in the Chili interface - Use minimal storage for matrix elements - Cache the non-zero helicity configuration, which speeds up initialisation of subsequent runs in particular on the device BUGFIXES - Fix read-in of selection weights - Fix too low output precision of cached results - Fix bug when returning a zero standard deviation/variance if the number of trials is one - Fix non positive definite standard deviation (and hence selection weights) - Use correct scale information in HDF5 output - Improve heuristics to set beams in HDF5 output - Increase FORM max. term number to fix colour factor generation bug - Fix moving FORM-generated files to a across filesystem - Fix crash in HDF5 output for weighted events - Fill dummy cross sections for auxiliary weights to suppress new HepMC3 warnings - Fix integer overflow bug when MPI summing MC event counters for many ranks - Remove dummy-event zero counting in HDF5 output, which broke the Sherpa readin - Fix crash in HDF5 output when all events are zero - Correct reported LHEH5 version string from 2.0.0 to 2.0.1 - Fix GPU device selection for MPI use - Fix accidental correlation in the random flavour channel selection - Fill correct scales and couplings in various output formats