ELOG - Parallelizing XF and pueoSim in the loop

Updates and Results Talks and Posters Advice Ideas Important Figures Write-Ups Outreach How-To Funding Opportunities GENETIS

GENETIS

Message ID: 234 Entry time: Tue Jul 25 16:07:25 2023

Author:	Dylan Wells
Subject:	Parallelizing XF and pueoSim in the loop

Standard Loop Architecture:

Complete an evolutionary step FOR EACH antenna before continuing on with the next step.

Steps:

1. Run the Genetic Algorithm for the entire population.

2. Run the XF radio simulation for the entire population.

3. Run the neutrino simulation software for the entire population.

4. Run root analysis and plots for the entire population.

However, due to constraints on the amount XF keys we have, we can only run 4-5 XF simulations at a time.

So, when the first antennas finish their XF simulations, their outputs will simply sit there until EVERY other antenna finished their XF simulations.

New Proposed Architecture:

Complete necessary evolutionary steps as a population, but string together those that don't rely on data from other antennas.

Steps:

1. Run the Genetic Algorithm for the entire population

2. Run the XF radio simulation for each antenna

- When an XF simulation finishes for an antenna, submit the pueoSim / root analysis jobs for that antenna

3. Run the plots for the entire population

This will allow us to complete most of our pueoSim computation while the XF portion of the loop is still running, cutting down the time between the final XF job finishing and the final pueoSim job finishing.

Test run of this new architecture with 100 antennas, 7,840,000 neutrinos per antenna.

Part	Time (seconds)
A	1
B1	1347
Entire B2	54017
B2 XF	53063
B2 remaining PUEO	954
E	7
F	23

After the final XF job finished, the pueoSim simulations and analysis of outputted root files were completed in 954 seconds,

Comparison Between the Two Architectures (both using job optimization from Elog 232)

Architecture	Neutrinos	Time In Loop (s)
Standard	4,000,000	~6500
New Proposed	7,840,000	~950

Notes on Chart:

Source of data located in /users/PAS1960/dylanwells1629/buildingPueoSim/testingouts/times.txt and /fs/ess/PAS1960/HornEvolutionTestingOSC/GENETIS_PUEO/BiconeEvolution/current_antenna_evo_build/XF_Loop/Evolutionary_Loop/Run_Outputs/2023_07_24_test5/time.txt respectively)

Time from standard uses the time one of the 250 jobs running in parallel took in my testing of parallelizing processes inside of pueoSim jobs: 250 jobs * (40 * 40000 neutrinos per job) / 100 individuals = 4,000,000 neutrinos per individual)

The New Proposed time includes time spent analyzing the outputted root files to find fitness scores and errors, which would have taken around 100 seconds * population size for its number of neutrinos and files per individual (/fs/ess/PAS1960/HornEvolutionTestingOSC/GENETIS_PUEO/BiconeEvolution/current_antenna_evo_build/XF_Loop/Evolutionary_Loop/Run_Outputs/2023_07_23_test5/time.txt for data on this number)

So, this new architecture can provide more improvements for the amount and speed of neutrino simulation in the loop on top of the methods discussed in Elog 232.

This architecture could also be applied to see improvements in the Bicone and Hpol loops which are both affected by the limited number of XF keys.

Additional Notes:

1. For this new architecture test, each antenna uses 49 jobs for neutrino simulation instead of 2.5 previously. (49 pueoSim + 1 XF for 50 jobs per antenna, 5,000 jobs per generation)

2. The time for each antenna to submit, queue, and finish its neutrino simulation jobs must be less than the length of the XF job, or the extra time will accumulate for each antenna, losing much of the time benefits. (As long as it is less, the time spent on just pueoSim should be invariant under an increase in population)

3. The number of core hours spent on pueoSim jobs will be roughly the same for the same number of neutrinos as each job is shorter (except for a small contribution of the job overhead taking a higher percentage of total time for shorter jobs)

4. Initially I had thought that maybe queuing pueoSim jobs while running the XF jobs could slow down the queue for the XF jobs. So, I made the loop wait to submit the batch pueoSim jobs until we had space for all of them to be active with the ~250 max jobs per user.

Additionally, while observing my many tests, I didn't notice any correlation between the number of CPU peuoSim jobs in the queue and the number of GPU XF jobs out of the queue.

5. Branch I'm developing this on is here

6. The total time for the test generation was 15.4 hours, which is slightly longer than the ~14 hours from the 2023_05_08 run. However, this test also used double the population size, larger values for the range of antenna heights (on average about 3x taller), and 20x more neutrinos simulated per antenna. So, the actual speed is better than it first looks.

ELOG V3.1.5-fc6679b