Discussion about this post

User's avatar
Grégoire Mesnil's avatar

Hi Benjamin,

Do we know how the inference computation is performed with a device map spread over VRAM/RAM/disk? Is everything being transferred to the GPU for computation or do we have inference performed on CPU for layers with parameters on RAM/disk? Thanks!

Expand full comment
4 more comments...

No posts