Next GW app to use a little less VRAM

taketwicedailey
taketwicedailey
Joined: 30 Nov 17
Posts: 25
Credit: 937896541
RAC: 2624573
Topic 231086

I can't help but wish the O3AS app used just a little bit less VRAM.

 

At present it uses a little more than 2GB. This means that effectively any graphics card with 2N GB of VRAM is limited to N-1 apps running concurrently. In my particular case, with many GPUs each with 8GB, that leaves quite a bit of performance left if only I could run 4 apps at a time on each, as it seems from the community that running as many apps as possible is best.

 

I know that previously 4GB was needed, then the app workload was split in half so that 2GB per section was needed. If we could split it into thirds for example, then this would be possible I would think.

 

Then cards with only 2GB could join the search for the first time, 4GB cards could run 2X, 8GB could run 4X, etc.

 

What do you think?

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 165
Credit: 2257578933
RAC: 631367

Actual VRAM usage for actual

Actual VRAM usage for actual GW app (O3AS) is already under 2 GB.
It requires (in the settings) about 2.5 GB to run, but the actual VRAM usage is in the range of 1800-1900 MB.

For about a year now, I've been running 2 such GW tasks in parallel on cards with 4 GB of memory and 4 ob 8 GB card without any problems. At least that's the case for the AMD/OpenCL platform. I don't have any NVIDIA cards in operation right now, so I can't check if this is the same if CUDA is used as a computing platform. But should be.

As far as I understand, the 2.5 GB requirement was deliberately set in order to "cut off" cards with only 2 GB by default, on which performing such tasks can cause problems if the GPU is not fully allocated for crunching and any other applications also use some of VRAM (even windows desktop + opened web browser can take up several hundred MB of VRAM). But this can be easily circumvented by specifying(fake) a VRAM capacity above 2.5 GB in the BOINC client. If the GPU is dedicated to calculations and the user knows what he is doing.

Ian&Steve C.
Ian&Steve C.
Joined: 19 Jan 20
Posts: 4141
Credit: 49475552245
RAC: 34626685

The actual VRAM use per

The actual VRAM use per process on my Nvidia cards is about 2.3GB, that’s the O3AS app only, on a system with no GUI and excluding all other processes.  Whether it’s being actively all used or not seems irrelevant since the task will fail anyway if the card tried to over-allocate. I’ve seen at times that AMD cards can prevent outright crashing when VRAM is over allocated by dumping it to the system RAM, but that still makes the task run really slow. 
 

the O3ASBu tasks are much more optimized than the older O3ASHF tasks were (when the OP started this thread). It’s not really as necessary to run several tasks in parallel as it was before. You’re not losing much production on cards with less VRAM by being only able to run 1 task. And you don’t really get better production running more than 2 anymore anyway (per task runtime is basically the same for 3,4,etc) 

_________________________________________________________________________

Mad_Max
Mad_Max
Joined: 2 Jan 10
Posts: 165
Credit: 2257578933
RAC: 631367

Hmm, interesting.I assume

Hmm, interesting.

I assume the significant difference in VRAM usage comes not from the difference in AMD vs NV hardware architectures, but from different software platforms: OpenCL vs CUDA.
And the NV OpenCL app use less memory closer to the level seen on AMD+OpenCL appp? Although probably also has a slightly lower performance as CUDA is usually better optimized.

As for over-allocation of VRAM, yes, this does not cause big problems on AMD cards - usually everything that does not fit in the main/dedicated VRAM is simply pushed into the system memory and the opencl application continues to work normally, accessing this data later via PCI-E bus if necessary. I've seen and tested this many times in practice.
The impact on performance of such VRAM <==> RAM "swapping" can be very different depending on which data has been displaced. If this is part of the main "core" data that is constantly being actively used in app calculations, the performance drop will be huge, at least few times (I have seen a drop in speed of the order of 3-5 times on mid range cards, will probably be even higher  on high-end models). But if this is some kind of rarely used set of data (let's say it is needed only at one short stage of calculations) or just an excessive allocation without actual use, then the impact on performance may be insignificant - within a few %.

OpenCL/GPU drivers are smart enough to identify and displace to the system RAM the least frequently/actively used data first. But of course, if even the "core" data exceeds the size of VRAM, then nothing can be done here and the performance penalty will be drastic.

Sandman192
Sandman192
Joined: 16 Nov 07
Posts: 12
Credit: 266415927
RAC: 6323

This is for web designers

Sorry I've found the problem.

No need to respond.

GWGeorge007
GWGeorge007
Joined: 8 Jan 18
Posts: 3179
Credit: 5164426723
RAC: 3911791

Sandman192 wrote: Sorry I've

Sandman192 wrote:

Sorry I've found the problem.

No need to respond.

Could you possibly explain what the problem was so we could know and potentially help other's who may also have the same problem?

George

Proud member of the Old Farts Association

San-Fernando-Valley
San-Fernando-Valley
Joined: 16 Mar 16
Posts: 550
Credit: 10693523003
RAC: 10484715

GWGeorge007

GWGeorge007 wrote:

Sandman192 wrote:

Sorry I've found the problem.

No need to respond.

Could you possibly explain what the problem was so we could know and potentially help other's who may also have the same problem?

Seems like he catched the wrong thread.
See "Add Einstein home page link in Server Status page".

sfv

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.