- remove method allocateGraphicsMemory(size_t size)
- pass allocation type in allocation properties
- set allocation type in allocateGraphicsMemoryInPreferredPool
Change-Id: Ia9296d0ab2f711b7d78aff615cb56b3a246b60ec
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Add debug variables to force resource locking on memory transfer calls
and to call makeResident() on mapVirtualAddress() call.
Change-Id: Ifa78d951fcb81812b10a98252bd414124dec9c74
Move definitions of objectNotUsed and objectNotResident to GraphicsAllocation
Change-Id: I2aec604a865cc6c975e9d1121028cbdd35c0b18a
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Move definition of allocations list method to internal_allocation_storage.cpp
Change-Id: I4c6038df8fd1b9335e8a74edbab33b78f9293d8f
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
Remove not needed includes from command_queue.h
Change-Id: I45963bf005471bd7716d55471474299a15e27b62
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
EnqueueOperation<GfxFamily>::getTotalSizeRequiredCS was ULTs-only.
Replace with the real CS size estimation from getCommandStream.
Change-Id: I4d15d342eb5edff6511acc9c80e13e9cc92d81ac
Updating files modified in 2018 only. Older files remain with old style
copyright header
Change-Id: Ic99f2e190ad74b4b7f2bd79dd7b9fa5fbe36ec92
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
- Switch to non param fixture for tests not requiring different params
- Add limited param set for tests requiring param fixture
- this decreases total test count by 100 while keeping the same scope.
Change-Id: Ic10a378d3eb7a2d06114435a9bd9652756945574
- zeroCopy means no need for data transfer when cpu transfer is
chosen during map/unmap operations
- tests cleanup
Change-Id: Id312267e51d593781c6fe536b6e96f42f32e5c02
This commit adds notifications to enqueue read buffer and image calls
and setters/getters to mark/check if an allocation is dumpable.
Change-Id: I123f24752d2a86abcf934e0d404f4e0ecf1729cc
- Apply layout for images only when Z size is equal to 1
- Fix generating local ids for local workgroup size
when any size is not power of 2
- Revert commit c53c09da45
Change-Id: Ie745782fafce2facbd877e3e33e4ba347cb2b09e
Signed-off-by: Mateusz Jablonski <mateusz.jablonski@intel.com>
- Command Stream Receiver should be used instead for locking.
- Remove not needed synchronization in clSetUserEventStatus
Change-Id: I17050dc70cb0be03b2003043a9666ba8df1a83c9
- Mark Kernel for aux translation
- Initial implementation of dispatchAuxTranslation for future use
Change-Id: Ifca1c9a893876eecc5678cdc4f564b2bfcae959a
- replace createGraphicsAllocationWithRequiredBitness with more general
methodallocateGraphicsMemoryInPreferredPool based on passed
AllocationData
- proper flags for allocation selected based on AllocationType
- remove allocateGraphicsMemory(size_t size, size_t alignment)
and use allocateGraphicsMemory(size_t size) instead where default
alignment is sufficient, otherwise use full options version:
allocateGraphicsMemory(size_t size, size_t alignment,
bool forcePin, bool uncacheable)
Change-Id: I2da891f372ee181253cb840568a61b33c0d71fc9
- change GraphicsAllocatoin::AllocationType to scoped enumeration
so that ALLOCATION_TYPE_ prefix in every enum value can be removed
- all accesses are typed (example AllocationType::IMAGE)
- Rename allocationType to AllocationUsage to eliminate confusion
with multiple AllocationType enums / types
Change-Id: I16003297ecfcb0aaa5779ad00706c5d983914bbe
This commit adds a capability to selectively enable/disable AUB capture,
i.e. by toggling the registry key from the outside or specifying the filter
with a kernel name and/or kernel start index and kernel end index.
Change-Id: Ib5d39c21863fbc4a95aa73c949b9779ff993de0f
- Change dirty state helpers to work on IndirectHeaps.
- Instead of comparing size in bytes and cpu pointers, compare gpu base
address and size of the heap in pages
- That allows to not have dirty flag for heaps that are coming from 4GB
allocator.
Change-Id: I0ff81e3c0945b32e4f872a100cd10b332b27ed24
- notifySourceCode, notifyKernelDebugData, notifyDeviceDestruction
- added processDebugData method in Program
- change options when SLD is active
- add space at the beginning of extension list options
Change-Id: Iac1e52f849544dbfda62407e112cde83fa94e3ad
- Add L3UltHelper to be able to tell if L3 config is programmable
- Run L3 config kernel tests according to its output
Change-Id: I55b76e2da325d28f62b0bde20250b68f02154ae2
- This allows applications to force the N:1 aggregation by creating out
of order queue.
- That switches csr to N:1 submission model where commands from multiple
command streams may be aggregated.
- That forces scenarios returning an event to be aggregated as well.
Change-Id: I8fd8d7f88bb2665234ee90870133120b206710a8
-This is required to enable N:1 submission model.
-If heaps are coming from different command queues that always
mean that STATE_BASE_ADDRESS needs to be reloaded
-In order to not emit any non pipelined state in CSR, this change
moves the ownership of IndirectHeap to one centralized place which is
CommandStreamReceiver
-This way when there are submissions from multiple command queues then
they reuse the same heaps, therefore preventing SBA reload
Change-Id: I5caf5dc5cb05d7a2d8766883d9bc51c29062e980
- add defines to command line
- remove most occurences of include "config.h"
Change-Id: I19d65d83c895fc6143d319d057a50e5ae3e78830
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
- ThkWrapper had uninitialized mFunc member, setting it
to nullptr
- D3DSurface could dereference null image pointer,
adding validateUpdateData method in SharingHandler
that may return CL_INVALID_MEM_OBJECT if memObject is invalid
Change-Id: Iaa4499bcea47baca156c9d28be4c93ba4f0e1ebb
We don't need BuiltinDispatchInfoBuilder in every place where built ins
are used. specifically in .cpp files generated from kernel binary.
Change-Id: Ie739951cdc93873993f78ad14cee656122af51fd
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
- Move indirect heap to internal allocator domain.
- Add logic in getIndirectHeap to allocate with proper API depending on
heap type
- Add State base Address programming, reflecting that now Indirect Object
Heap is placed in 4GB domain.
- For AddPatchInfoCommentsForAUBDump mode , keep all heaps in non 4GB mode.
Change-Id: I6862f6a249e444d0d6cfe7e499a10d43f284553e
- Allow indirect heap to work in 2 modes:
first mode is when it will be used as an allocation from 4GB allocator.
In such scenario driver will return offset from base of the allocator region.
Second mode is the legacy mode which will be used by device enqueue, this
will results in heap CPU base address being programmed in State Base Address
commands and during programming heap offset base of 0 will be returned.
Change-Id: Ica098f3278b6b6ed5036b4c5ab7461dc61d8ee86
There are differences in qPitch programming between Gen8 vs Gen9+
devices and this requires special operation when image is zero-copy.
For Gen8 qPitch is distance in rows while Gen9+ it is in pixels.
Minimum value of qPitch is 4 and this causes slicePitch = 4*rowPitch on
Gen8.
To allow zero-copy we have to tell what is correct value rowPitch which
should equal to slicePitch.
Change-Id: I58dea004e3c7f9f4dfabd154d02749c15b6b0246
Signed-off-by: Artur Harasimiuk <artur.harasimiuk@intel.com>
Refactoring in ULTs around preemption:
-refactoring ULTS to not fail with default preemption mode
-fixing ULT memory leaks observed after enabling preemption
-mocking getSipKernel in ULTs (to minimize ULT execution time)
Change-Id: I194b56173d7cb23aae94eeeca60051759c817e10
When inheriting task count from parent events,
don't take into account externally synchronized events
Change-Id: I52d861e482669a18e2aca499c813716bb4951b74
- change the way we handle blocked commands.
- instead of allocating CPU pointer and populating it with commands, create
real IndirectHeap that may be later submitted to the GPU
- that removes a lot of copy operations that were happening on submit time
- for device enqueue, this requires dsh & shh to be passed directly to the
underlying commands, in that scenario device queue buffers are not used
Change-Id: I1124a8edbb46777ea7f7d3a5946f302e7fdf9665
* adding support for map/unmap
* adding support for origin/region validation with mipmaps
* fixing slices returned in map/unmap
* removing ambiguity around mipLevel naming
* enabling cl_khr_mipmap_image in current shape
* enabling cl_khr_mipmap_image_writes in current shape
* fixing CompileProgramWithReraFlag test
Change-Id: I0c9d83028c5c376f638e45151755fd2c7d0fb0ab
Any CPU related updates such as clEnqueueMapBuffer or similar
need to trigger a re-dump of memory prior to the next clEnqueue call.
Change-Id: I7b31e559278e92ff55b6ebab8ef4190caef1ebc0
For image with defined sharingHandler test:
- enqueueAcquireSharedObjects
- enqueueReleaseSharedObjects
Change-Id: I8835e4a4aa06a08e57dc207b168810162e44445c
- Do not obtain pattern allocation from reusable pool.
- This is due to the fact that it may contain allocations from internal
heap, which cannot be used for arguments declared as kernel argument.
Change-Id: I6c73445c409edc4ce25f8d8eba966f512dfd6cc9
- remove unused MemoryManagementFixture.
MemoryLeaks are tracked using MemoryLeakListener no need to duplicate
with Fixure.
MMF should be used when you need to inject memory allocation failure
Change-Id: I95bcaa7051acf540c5b015c5489ed6a6fc38ee8e
- This removes Instruction Heap allocation from enqueue path
- Blocked path is handled as well
- Heap is no longer allocated on demand it is bind to kernelInfo.
Change-Id: I54545beceed3404ee0330a8bac2b0934944cac30
- Switch to internal heap for kernel ISA allocations.
- remove IH from various functions
- remove IHState from CSR , IH is never dirty
- ISA is no longer copied on enqueue calls.
Change-Id: I0099cf2a9ebab6192ea03a74dd35f7da963fd5a5
- Measure time between wait calls. If delay is exeeded use QuickKmdSleep
- Kmd Notify helper functions
- Refactor overriding from debug variables
- Refactor Kmd Notify tests
Change-Id: I123c31f492d98fd304184f99ee0bf7d733d06f04
- KmdNotifyProperties struct for CapabilityTable that can be extended by
incoming KmdNotify related optimizations
- Quick KMD sleep optimization that is called from async events handler
- Optimization makes a taskCount check in busy loop with much smaller
delay than basic version of KMD Notify optimization
Change-Id: I60c851c59895f0cf9de1e1f21e755a8b4c2fe900
- Also refactor debug manager tests , they now check for default value
in igdrcl.config file
- There is no need to write dedicated tests now , so I remove them.
Change-Id: Ib338ca05b6059302c29469c673239e7886dc4b9b
- read only memory cannot be used for allocation,
Oses cannot create graphics alocation for such memory
- if memory allocation fails for host_ptr passed
to enqueueWrite calls, then try doing new allocation
and copy host_ptr on cpu
Change-Id: I415a4673ae1319ea8f77e53bd8fba7489fe85218
- Return error on origin > 0 or region > 1 when its not allowed
- For 1Darray, array region and origin are stored on 2nd position.
For 2Darray, its on 3rd postion
- Fix map offset for 1Darray image
- Fix CPU data transfer for 1Darray image
Change-Id: Id35ba5f54f117e7af318ca7e6e03c1fc942ce729
- Microseconds offer better precision.
- Some workloads require threshold less then 1 millisecond to work
efficiently.
Change-Id: I1a565049340fb6eeebe5c0a61ededae9959daca8
- Due to use cases where one shared buffer may be mapped to multiple CL
buffers we need to flush DC between enqueues.
Change-Id: I05d7f844afe31d52a0004f5e2e5efa776f9dadbe
- Dont make cpu/gpu writes on read-only unmap
- Read/Write on limited map range only
- Overlaps checks for non read-only maps
- Fixed cmd type on returned event
Change-Id: I98ca542e8d369d2426a87279f86cadb0bf3db299
When queue is blocked on non-blocking call, map operation is added to
waitlist dependencies. Returning slice/row pitch for map image was skipped
Change-Id: I46f97590315e7aee7fbbfbdb615f383cdb666307
- Introducing MapInfo struct which will be used as container for multiple
map operations
- Unified mapped offset and size for Buffers and Images
- Fixed incorrect map params for CPU and GPU path
- Missing API level checks
Change-Id: Ib4077c9e2c0c333b131ffd5ccbc4a1404920eb5b
-If out of order flag was disabled then pipe control was not having dc flush.
-This could led to a batch buffer that doesn't end with dc flush.
-This change adds differentiation between pipe controls that may be erased and
pipe controls that are used as a part of epilogue command
Change-Id: Ic9c970c75c89ff524a0e40506eff6dd097760145
-For in order queue application can have fine grain granularity of completion
-For out of order queue application wants to execute workloads concurrently
-This change disables pipe control nooping for ioq calls when event returned.
Change-Id: Iaeaf677f768f7434b2efa1842b50653ab80777ad
- account for initial setting (when set mode was equal to initial(Disabled))
estimate size in cmdStreamCS, program MMIO
Change-Id: Ice218ae986583c8f3bab4f4f6979e38f03e30d7e
- This change enabled multiple independent command queues to execute
concurrently without stalling pipe controls in between
- This change removes L3 flushes between kernels
- Dependencies between commands are resolved via task level mechanism
- Out of order queues are not changing task level between submissions
- In order queues are increasing task level between submissions
- Whenever task level changes there is pipe control with cs stall emitted
between GPGPU_WALKERs
Change-Id: I558653b296424e4775d060df3072e2a50684b715
-Do not inc/dec reference count for flush stamps while used only for
update
-FlushStamp doesn't need to be atomic,replace with atomic bool flag
to prevent usage while uniinitialized
-Clean not needed private new
Change-Id: Idad2b318f988de1e7af7642047c67f931e9772aa
- Instruction heap is currently heavily used as every kernel copies ISA into
it.
- It dries out very fast and each change to new heap requires whole pipeline
drain that prevents concurrency
- Problem is even larger when sip kernel is used as it limits the total heap
size
- In order to maximize heap re-use and to limit the count of pipeline drains
this change introduces new minimal size for instruction heap 512 KB.
Change-Id: Ic54e9ef4448b1d35dab01b084ee1d59b509642cb
- In various scenarios code was not programming the max heap size correctly
- It was possible for SSH to overcome the limit
- Size was programmed smaller then it really was, which resulted in smaller
reuse, which led to SBA reprogramming which led to lower performance in ooq
scenarios
- This change fixes the heap size programming by always utilizing full
allocation size and always limiting SSH at proper value
Change-Id: Ib703d2b0709ed8227a293def3a454bf1bb516dfd
- Program one PS with gpgpu selection and media sampler
- Program PS only when media sampler requirement changed
or when preamble was not sent
Change-Id: I85ba3f74087733e79d048e120aeb8b4b04796e00
Fixing InterfaceDescriptor programming for
blocked commands when MidThread preemption is
enabled
Additionally, fixing couple of tests that block
global preemption enabling in ULTs
Change-Id: I454c9608f8606f23d7446785ac24c7c7d8701ae0
- It should use thread count not EU count.
- change variable name to reflect that we work on sublices.
- fix test description, add missing test
- change hasBarrier variable to be boolean
Change-Id: I627bdf17b661d2f9b5eb3d8cd6ca53eba5d46b81
- Call waitForTaskCountAndCleanAllocationList with latest flushed task count
to reflect what was actually sent to HW.
- refactor cleanAllocationList to waitForTaskCountAndCleanAllocationList
Change-Id: I5301185c5fce212e39eb017b952b43c279559cf4
- Prevents destruction of MemObj while it may still be in use.
- Add UNRECOVERABLE to check whether object is deleted while having
dependencies, fix all problems is tests due to that fact.
- Fix special queue setting, clean interfaces.
Change-Id: I2a467e80df00ea1650decdcfa6866acf10b441f8
- When command queue is blocked, all heaps are being stored in temporary
allocations, command buffers are being pre-programmed, heaps are being set
on those temporary allocations with the assumption that all heaps start with
offset 0.
- Problem was when the actual submissions happened, all those temporary heaps
were just copied to appended command queue heaps, so when something was there
then new stuff was copied right after it. It means that all state was
incorrect as the offsets are not valid anymore and will point to wrong
location.
- This change releases command queue heaps when blocked command is being
submitted to make sure they will be programmed with the proper offset in newly
allocate command queue heap.
Change-Id: I3e30be13caf4df8621ddb18f8448ffaf0f1278d1
- due to the fact that device mutex was obtained to prevent threaded access to
image there was a problem when other thread was also doing readImage call
That thread got read Image kernel mutex first and then it was acquiring device
mutex, which was taken by other thread doing mapImage call.
- In current code device mutex is not taken to service mapImage call, instead
image is being guarded by its own mutex.
Change-Id: Ic4c5a019708d7ec5b240bc5b08c5a65173827392
- Fix tests that were triggering the UNRECOVERABLE scenario
- Change UNRECOVERABLE to DEBUG_BREAK in some places
Change-Id: I479baac4941b485af9ea81a61a1a03d2f3f42e6a
- This causes event tree update if virtual event is holding commands or
callbacks
- That causes race between other threads that may be updating the tree
Change-Id: Ic80a8b71ed1e1c1deab8af1bc64f8ce81c21de1b