I can't help but wish the O3AS app used just a little bit less VRAM.
At present it uses a little more than 2GB. This means that effectively any graphics card with 2N GB of VRAM is limited to N-1 apps running concurrently. In my particular case, with many GPUs each with 8GB, that leaves quite a bit of performance left if only I could run 4 apps at a time on each, as it seems from the community that running as many apps as possible is best.
I know that previously 4GB was needed, then the app workload was split in half so that 2GB per section was needed. If we could split it into thirds for example, then this would be possible I would think.
Then cards with only 2GB could join the search for the first time, 4GB cards could run 2X, 8GB could run 4X, etc.
What do you think?
Copyright © 2024 Einstein@Home. All rights reserved.
Actual VRAM usage for actual
)
Actual VRAM usage for actual GW app (O3AS) is already under 2 GB.
It requires (in the settings) about 2.5 GB to run, but the actual VRAM usage is in the range of 1800-1900 MB.
For about a year now, I've been running 2 such GW tasks in parallel on cards with 4 GB of memory and 4 ob 8 GB card without any problems. At least that's the case for the AMD/OpenCL platform. I don't have any NVIDIA cards in operation right now, so I can't check if this is the same if CUDA is used as a computing platform. But should be.
As far as I understand, the 2.5 GB requirement was deliberately set in order to "cut off" cards with only 2 GB by default, on which performing such tasks can cause problems if the GPU is not fully allocated for crunching and any other applications also use some of VRAM (even windows desktop + opened web browser can take up several hundred MB of VRAM). But this can be easily circumvented by specifying(fake) a VRAM capacity above 2.5 GB in the BOINC client. If the GPU is dedicated to calculations and the user knows what he is doing.
The actual VRAM use per
)
The actual VRAM use per process on my Nvidia cards is about 2.3GB, that’s the O3AS app only, on a system with no GUI and excluding all other processes. Whether it’s being actively all used or not seems irrelevant since the task will fail anyway if the card tried to over-allocate. I’ve seen at times that AMD cards can prevent outright crashing when VRAM is over allocated by dumping it to the system RAM, but that still makes the task run really slow.
the O3ASBu tasks are much more optimized than the older O3ASHF tasks were (when the OP started this thread). It’s not really as necessary to run several tasks in parallel as it was before. You’re not losing much production on cards with less VRAM by being only able to run 1 task. And you don’t really get better production running more than 2 anymore anyway (per task runtime is basically the same for 3,4,etc)
_________________________________________________________________________
Hmm, interesting.I assume
)
Hmm, interesting.
I assume the significant difference in VRAM usage comes not from the difference in AMD vs NV hardware architectures, but from different software platforms: OpenCL vs CUDA.
And the NV OpenCL app use less memory closer to the level seen on AMD+OpenCL appp? Although probably also has a slightly lower performance as CUDA is usually better optimized.
As for over-allocation of VRAM, yes, this does not cause big problems on AMD cards - usually everything that does not fit in the main/dedicated VRAM is simply pushed into the system memory and the opencl application continues to work normally, accessing this data later via PCI-E bus if necessary. I've seen and tested this many times in practice.
The impact on performance of such VRAM <==> RAM "swapping" can be very different depending on which data has been displaced. If this is part of the main "core" data that is constantly being actively used in app calculations, the performance drop will be huge, at least few times (I have seen a drop in speed of the order of 3-5 times on mid range cards, will probably be even higher on high-end models). But if this is some kind of rarely used set of data (let's say it is needed only at one short stage of calculations) or just an excessive allocation without actual use, then the impact on performance may be insignificant - within a few %.
OpenCL/GPU drivers are smart enough to identify and displace to the system RAM the least frequently/actively used data first. But of course, if even the "core" data exceeds the size of VRAM, then nothing can be done here and the performance penalty will be drastic.
This is for web designers
)
Sorry I've found the problem.
No need to respond.
Sandman192 wrote: Sorry I've
)
Could you possibly explain what the problem was so we could know and potentially help other's who may also have the same problem?
Proud member of the Old Farts Association
GWGeorge007
)
Seems like he catched the wrong thread.
See "Add Einstein home page link in Server Status page".
sfv