[Offload] Use the kernel argument size directly in AMDGPU offloading (#94667)

Summary:
The old COV3 implementation of HSA used to omit the implicit arguments
from the kernel argument size. For COV4 and COV5 this is no longer the
case so we can simply use the size reported from the symbol information.

See
https://github.com/ROCm/ROCR-Runtime/issues/117#issuecomment-812758161
This commit is contained in:
Joseph Huber
2024-06-06 15:19:55 -05:00
committed by GitHub
parent 9293fc7981
commit 9e209a4a37

View File

@@ -3272,19 +3272,13 @@ Error AMDGPUKernelTy::launchImpl(GenericDeviceTy &GenericDevice,
if (ArgsSize < KernelArgsSize)
return Plugin::error("Mismatch of kernel arguments size");
// The args size reported by HSA may or may not contain the implicit args.
// For now, assume that HSA does not consider the implicit arguments when
// reporting the arguments of a kernel. In the worst case, we can waste
// 56 bytes per allocation.
uint32_t AllArgsSize = KernelArgsSize + ImplicitArgsSize;
AMDGPUPluginTy &AMDGPUPlugin =
static_cast<AMDGPUPluginTy &>(GenericDevice.Plugin);
AMDHostDeviceTy &HostDevice = AMDGPUPlugin.getHostDevice();
AMDGPUMemoryManagerTy &ArgsMemoryManager = HostDevice.getArgsMemoryManager();
void *AllArgs = nullptr;
if (auto Err = ArgsMemoryManager.allocate(AllArgsSize, &AllArgs))
if (auto Err = ArgsMemoryManager.allocate(ArgsSize, &AllArgs))
return Err;
// Account for user requested dynamic shared memory.