intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-22 15:41:35 +08:00

Author	SHA1	Message	Date
Wenju He	64574d3564	[libclc] Remove -fno-builtin from compile options (#162075 ) The flag was added in `8ef48d07ef` to suppress build warning and is no longer needed. It adds "no-builtins" attribute, which prevents libclc functions from being inlined into caller that don't have the attribute. The flag is meant to prevent folding standard library calls into optimized implementations. For libclc device targets, however, such target‑driven folding is desirable. llvm-diff shows no change to amdgcn--amdhsa.bc and nvptx--nvidiacl.bc. Co-authored-by: Mészáros Gergely <gergely.meszaros@intel.com>	2025-10-07 08:14:54 +08:00
Matt Arsenault	6f3d765d04	libclc: Add gfx1250 and gfx1251 to amdgpu target list (#162034 )	2025-10-06 01:56:17 +00:00
Fraser Cormack	585fd4cea0	[libclc] Propose new libclc maintainer (#161141 ) Wenju He has been active on the libclc project for a while now and has been contributing to the overall health and steering the future of the project.	2025-09-29 10:20:00 +01:00
Fraser Cormack	483d73a5e0	[libclc] Move myself to the list of inactive maintainers Change my email address in the process. I will not be able to keep up maintainership duties on this project in the future. Adding the wording on the inactive maintainers section myself like this feels self-aggrandizing but was copied from other LLVM projects.	2025-09-29 08:40:19 +01:00
Wenju He	7d1adab5a6	[libclc] Fix ctest failures after `7f3661128b`: adjust external check and make shuffle helpers static (#160036 ) * Replace call-site check with external declaration scan (grep declare) to avoid false positives for not-inlined __clc_* functions. * _clc_get_el* helpers are defined as inline in clc_shuffle2.cl, so they have available_externally attribute. When they fail to inline they are deleted by EliminateAvailableExternallyPass and become unresolved in cedar-r600--.bc. Mark them static to resolve the issue. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-09-22 15:55:49 +08:00
Joseph Huber	1597fad405	[Clang] Rename elementwise builtins to `clzg` and `ctzg` (#157128 ) Summary: The added bit counting builtins for vectors used `cttz` and `ctlz`, which is consistent with the LLVM naming convention. However, these are clang builtins and implement exactly the `__builtin_ctzg` and `__builtin_clzg` behavior. It is confusing to people familiar with other other builtins that these are the only bit counting intrinsics named differently. This includes the additional operation for the undefined zero case, which was added as a `clzg` extension.	2025-09-19 07:00:47 -05:00
Wenju He	7f3661128b	[libclc] Remove __attribute__((always_inline)) (#158791 ) always_inline doesn't guarantee performance improvement. Target-specific optimizations decide whether inlining is profitable. Changes to amdgcn--amdhsa.bc: * _Z9__clc_logDv16_f and _Z15__clc_remainderDv16_fS_ are not inlined. * sincos vector function code size has doubled due to apparent duplication. Also replace typo _CLC_DECL with _CLC_DEF for function definition.	2025-09-18 07:47:35 +08:00
Wenju He	fef88d2ef2	[libclc][NFC] Update README.md to use runtime build (#158283 ) LLVM_ENABLE_PROJECTS=libclc is deprecated, see https://github.com/llvm/llvm-project/blob/a2a9601ea49a/llvm/CMakeLists.txt#L223-L228	2025-09-15 10:10:07 +08:00
Wenju He	de8859da54	[libclc] Create LIBCLC_OUTPUT_LIBRARY_DIR directory before build (#158171 ) This fixes `No such file or directory` error when "Unix Makefiles" generator is used, see https://github.com/intel/llvm/issues/20058. Ninja generator implicitly creates output directory when generating libclc libraries, but "Unix Makefiles" generator does not.	2025-09-12 19:02:30 +08:00
Matt Arsenault	0d65856584	libclc: Remove HAVE_LLVM version macros (#158257 ) This doesn't need to pretend to support multiple versions of llvm and these are old anyway.	2025-09-12 19:08:57 +09:00
Wenju He	00b13c4103	[NFC][libclc] Replace _CLC_V_V_VP_VECTORIZE macro with use of unary_def_with_ptr_scalarize.inc (#157002 ) Commit `d50f2ef437` removes _CLC_V_V_VP_VECTORIZE from header file, but the macro is still used in our downstream code: https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/modf.cl#L30 https://github.com/intel/llvm/blob/0433e4d6f5c9/libclc/libspirv/lib/ptx-nvidiacl/math/sincos.cl#L31 We can either revert `d50f2ef437` or replace macro with use of unary_def_with_ptr_scalarize.inc. This PR uses the latter approach.	2025-09-09 08:11:27 +08:00
Wenju He	a271d07488	[libclc] Implement erf/erfc vector function with loop since scalar function is large (#157055 ) This PR reduces amdgcn--amdhsa.bc size by 1.8% and nvptx64--nvidiacl.bc size by 4%. Loop trip count is constant and backend can decide whether to unroll. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-09-05 19:58:24 +08:00
Wenju He	28d9255aa7	[libclc] Override generic symbol using llvm-link --override flag instead of using weak linkage (#156778 ) Before this PR, weak linkage is applied to a few CLC generic functions to allow target specific implementation to override generic one. However, adding weak linkage has a side effect of preventing inter-procedural optimization, such as PostOrderFunctionAttrsPass, because weak function doesn't have exact definition (as determined by hasExactDefinition in the pass). This PR resolves the issue by adding --override flag for every non-generic bitcode file in llvm-link run. This approach eliminates the need for weak linkage while still allowing target-specific implementation to override generic one. llvm-diff shows imporoved attribute deduction for some functions in amdgcn--amdhsa.bc, e.g. %23 = tail call half @llvm.sqrt.f16(half %22) => %23 = tail call noundef half @llvm.sqrt.f16(half %22)	2025-09-05 19:58:07 +08:00
Wenju He	fb5626fdd5	[NFC][libclc] Set MACRO_ARCH to ${ARCH} uncondionally before customizing (#156789 ) Our downstream libclc add a few more targets that customizes build_flags and opt_flags. Then in each customization block, MACRO_ARCH is defined to be ${ARCH}. Hoisting MACRO_ARCH definition out of if-else-end block avoids code duplication. This also avoids potential error when MACRO_ARCH definition is forgotten, e.g. in https://github.com/intel/llvm/pull/19971.	2025-09-05 07:35:40 +08:00
Wenju He	3dff9ac495	[NFC][libclc] Remove unused -DCLC_INTERNAL build flag, remove unused M_LOG210 (#156590 )	2025-09-05 06:44:37 +08:00
Wenju He	d50f2ef437	[NFC][libclc] Move _CLC_V_V_VP_VECTORIZE macro into clc_lgamma_r.cl and delete clcmacro.h (#156280 ) clcmacro.h only defines _CLC_V_V_VP_VECTORIZE which is only used in clc/lib/generic/math/clc_lgamma_r.cl.	2025-09-03 08:23:01 +08:00
Wenju He	a247da4f93	[libclc] update __clc_mem_fence: add MemorySemantic arg and use __builtin_amdgcn_fence for AMDGPU (#152275 ) It is necessary to add MemorySemantic argument for AMDGPU which means the memory or address space to which the memory ordering is applied. The MemorySemantic is also necessary for implementing the SPIR-V MemoryBarrier instruction. Additionally, the implementation of __clc_mem_fence on Intel GPUs requires the MemorySemantic argument. Using __builtin_amdgcn_fence for AMDGPU is follow-up of https://github.com/llvm/llvm-project/pull/151446#discussion_r2254006508 llvm-diff shows no change to nvptx64--nvidiacl.bc.	2025-09-01 11:03:45 +08:00
Romaric Jodin	4a7205f892	libclc: CMake: include GetClangResourceDir (#155836 ) `get_clang_resource_dir` is not guarantee to be there. Make sure of it by including `GetClangResourceDir`.	2025-08-28 17:56:33 +01:00
Wenju He	e6d095e89c	[libclc] Only create a target per each compile command for cmake MSVC generator (#154479 ) libclc sequential build issue addressed in commit `0c21d6b4c8` is specific to cmake MSVC generator. Therefore, this PR avoids creating a large number of targets when a non-MSVC generator is used, such as the Ninja generator, which is used in pre-merge CI on Windows in llvm-project repo. We plan to migrate from MSVC generator to Ninja generator in our downstream CI to fix flaky cmake bug `Cannot restore timestamp`, which might be related to the large number of targets.	2025-08-22 07:45:42 +08:00
Fraser Cormack	5c411b3c0b	[libclc] Use elementwise ctlz/cttz builtins for CLC clz/ctz (#154535 ) Using the elementwise builtin optimizes the vector case; instead of scalarizing we can compile directly to the vector intrinsics.	2025-08-21 09:32:03 +01:00
Wenju He	a450dc80bf	[libclc] Implement __clc_get_local_size/__clc_get_max_sub_group_size for amdgcn (#153785 ) This simplifies downstream refactoring of libspirv workitem function in https://github.com/intel/llvm/tree/sycl/libclc/libspirv/lib/generic	2025-08-19 07:51:17 +08:00
Wenju He	76bb98746b	[NFC][libclc] add missing __CLC_ prefix all internal macros (#153523 ) This unifies naming scheme of macros to address review comment https://github.com/intel/llvm/pull/19779#discussion_r2272194357 math constant value macros are not changed, e.g. `#define AU0 -9.86494292470009928597e-03`	2025-08-18 07:21:04 +08:00
Wenju He	bce14c69db	[libclc] Fix out-of-bound value for workitem functions according to OpenCL spec (#153784 )	2025-08-18 06:51:01 +08:00
Wenju He	111cdaac99	[libclc] Add __attribute__((const)) to functions that don't access memory (#152456 ) Before this PR, PostOrderFunctionAttrsPass in opt run can deduce memory(none) for these functions. This PR explicitly adds the attribute to align with Clang's OpenCL headers and ensures the attribute is present throughout the compilation flow. Generated bitcode files amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc become slightly smaller.	2025-08-12 17:19:08 +08:00
Wenju He	68c609b6c8	[libclc] Fix libclc install on Windows when MSVC generator is used (#152703 ) Fix a regression of `df74736732`. cmake MSVC generator is multiple configurations. Build type is not known at configure time and CMAKE_CFG_INTDIR is evaluated to $(Configuration) at configure time. libclc install fails since $(Configuration) in bitcode file path is unresolved in libclc/cmake_install.cmake at install time. We need a solution that resolves libclc bitcode file path at install time. This PR fixes the issue using CMAKE_INSTALL_CONFIG_NAME which can be evaluated at install time. This is the same solution as in https://reviews.llvm.org/D76827	2025-08-11 18:05:42 +08:00
Wenju He	12cec437c6	[libclc] Implement clc_log/sinpi/sqrt with __nv_* functions (#150174 ) This is to upstream implementations in https://github.com/intel/llvm/tree/sycl/libclc/clc/lib/ptx-nvidiacl/math	2025-08-11 08:53:49 +08:00
Wenju He	1458eb206f	[NFC][libclc] Delete unused clc/shared/binary_decl_with_scalar_second_arg.inc (#152463 )	2025-08-08 08:50:14 +08:00
Wenju He	d618c36cb7	[libclc] Add missing clc/lib/ptx-nvidiacl/SOURCES to CMAKE_CONFIGURE_DEPENDS (#152431 )	2025-08-07 16:42:13 +08:00
Wenju He	3d1c1a5277	[libclc] Set TARGET_FILE property for prepare-${obj_suffix} target (#152245 ) The target's output bitcode `libclc_builtins_lib` is located in a sub-directory in clang resource directory since `df74736732`. Setting TARGET_FILE property can allow targets in non-libclc project to obtain the path to `libclc_builtins_lib`.	2025-08-07 08:28:43 +08:00
Wenju He	af16fc2e2a	[libclc] Move mem_fence and barrier to clc library (#151446 ) __clc_mem_fence and __clc_work_group_barrier function have two parameters memory_scope and memory_order. The design allows the clc functions to implement SPIR-V ControlBarrier and MemoryBarrier functions in the future. The default memory ordering in clc is set to __ATOMIC_SEQ_CST, which is also the default and strongest ordering in OpenCL and C++. OpenCL cl_mem_fence_flags parameter is converted to combination of __MEMORY_SCOPE_DEVICE and __MEMORY_SCOPE_WRKGRP, which is passed to clc. llvm-diff shows no change to nvptx64--nvidiacl.bc. llvm-diff show a small change to amdgcn--amdhsa.bc and the number of LLVM IR instruction is reduced by 1: https://alive2.llvm.org/ce/z/_Uhqvt	2025-08-06 09:49:28 +08:00
Wenju He	04691aae0d	[libclc] Refine id in async_work_group_copy STRIDED_COPY (#151644 ) Move id first along 0th dimension to achieve coalesced memory access when stride is 1.	2025-08-05 08:00:17 +08:00
Fraser Cormack	df74736732	[clang] Add the ability to link libclc OpenCL libraries (#146503 ) This commit adds driver support for linking libclc OpenCL libraries. It takes the form of a new optional flag: --libclc-lib=namespec. Nothing is linked unless this flag is specified. Not all libclc targets have corresponding clang targets. For this reason it is desirable for users to be able to specify a libclc library name. We support this by taking both a library name (without the .bc suffix) or a filename. Both of these are searched for in the clang resource directory. Filenames are also checked themselves so that absolute paths can be provided. The syntax for specifying filenames (as opposed to library names) uses a leading colon (:), inspired by the -l option. To accommodate this option, libclc libraries are now placed into clang's resource directory in an in-tree configuration. The libraries are all placed in <resource-dir>/lib/libclc and are not grouped under host-specific directories as some other runtime libraries are; it is not expected that OpenCL libraries will differ depending on the host toolchain. Currently only the AMDGPU toolchain supports this option as a proof of concept. Other targets such as NVPTX or SPIR/SPIR-V could support it too. We could optionally let target toolchains search for libclc libraries themselves, possibly when passed an empty --libclc-lib.	2025-08-04 15:37:22 +01:00
Fraser Cormack	b0313adefa	[libclc] Add an option to build SPIR-V targets with the LLVM backend (#151347 ) This removes the dependency on an external tool to build the SPIR-V files. It may be of interest to projects such as Mesa. Note that the option is off by default as using the SPIR-V backend, at least on my machine, uses a lot of memory and the process is often killed in a parallelized build. It does complete, however. Fixes #135327.	2025-08-01 09:48:40 +01:00
Fraser Cormack	586cacdbdd	[libclc] Optimize generic CLC fmin/fmax (#128506 ) With this commit, the CLC fmin/fmax builtins use clang's __builtin_elementwise_(min\|max)imumnum which helps us generate LLVM minimumnum/maximumnum intrinsics directly. These intrinsics uniformly select the non-NaN input over the (quiet or signalling) NaN input, which corresponds to what the OpenCL CTS tests. These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets. Note that there is ongoing discussion regarding how these builtins should handle signalling NaNs in the OpenCL specification and whether they should be able to return a quiet NaN as per the IEEE behaviour. If the specification and/or CTS is ever updated to allow or mandate returning a qNAN, these builtins could/should be updated to use __builtin_elementwise_(min\|max)num instead which would lower to LLVM minnum/maxnum intrinsics. The SPIR-V targets maintain the old implementations, as the LLVM -> SPIR-V translator can't currently handle the LLVM intrinsics. The implementation has been simplifies to consistently use clang builtins, as opposed to before where the half version was explicitly defined. [1] https://github.com/KhronosGroup/OpenCL-CTS/pull/2285	2025-07-29 13:21:42 +01:00
Fraser Cormack	76bebb5be9	[libclc] Fix building top-level 'libclc' target (#150972 ) With libclc being a 'runtime', the top-level build assumes that there is a corresopnding 'libclc' target. We previously weren't providing this, leading to a build failure if the user tried to build it. This commit remedies this by adding support for building the 'libclc' target. It does so by adding dependencies from the OpenCL builtins to this target. It uses a configurable in-between target - libclc-opencl-builtins - to ease the possibility of adding non-OpenCL builtin libraries in the future.	2025-07-29 10:53:31 +01:00
Wenju He	5223317210	[libclc] Add generic native half implementation of __clc_normalize (#150165 ) This is ported from https://github.com/intel/llvm/blob/sycl/libclc/libspirv/lib/generic/geometric/normalize.cl and can pass a closed-source OpenCL CTS "test_geometrics geom_normalize --half CL_DEVICE_TYPE_GPU" on intel GPU. llvm-diff amdgcn--amdhsa.bc shows fpext/fptrunc insts are now removed from normalize function.	2025-07-29 08:29:12 +08:00
Wenju He	bcd0d97224	[libclc] Simplify unary_def_scalarize.inc's use in __clc_erf/erfc/tgamma (#150181 ) Also delete unary_def_via_fp32.inc. There are small changes in amdgcn--amdhsa.bc due to vector conversion is scalarized, e.g. %2 = fpext <4 x half> %0 to <4 x float> %3 = extractelement <4 x float> %2, i64 0 %4 = tail call float @llvm.fabs.f32(float %3) -> %2 = extractelement <4 x half> %0, i64 0 %3 = tail call half @llvm.fabs.f16(half %2) %4 = fpext half %3 to float	2025-07-29 08:25:58 +08:00
Michał Górny	abe93d9d7e	[libclc] Fix installed symlinks to be relative again (#149728 ) Fix the symlink creation logic to use relative paths instead of absolute, in order to ensure that the installed symlinks actually refer to the installed .bc files rather than the ones from the build directory. This was broken in #146833. The change is a bit roundabout but it attempts to preserve the spirit of #146833, that is the ability to use multiple output directories (provided they all resides in `${LIBCLC_OUTPUT_LIBRARY_DIR}` and preserve the same structure in the installed tree). Signed-off-by: Michał Górny <mgorny@gentoo.org>	2025-07-21 20:59:31 +02:00
Michał Górny	58c3affdaa	[libclc] Expose `prepare_builtins_` variables in top-level CMakeLists (#149657 ) Fix `libclc/utils/CMakeLists.txt` to expose `prepare_builtins_` variables in parent scope. This was a regression introduced in #148815 where the code was moved into subdirectory, and the variables would no longer be accessible to calls in top-level CMakeLists, resulting in attempting to build targets with empty command: ``` [1566/1676] cd /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build && -o /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/obj.libclc.dir/clspv--/builtins.opt.clspv--.bc FAILED: clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc cd /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build && -o /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/clspv--.bc /var/tmp/portage/llvm-core/libclc-22.0.0.9999/work/libclc_build/obj.libclc.dir/clspv--/builtins.opt.clspv--.bc /bin/sh: line 1: -o: command not found ```	2025-07-20 12:26:51 +09:00
Wenju He	9c26f37ce3	[libclc] Add generic implementation of some atomic functions in OpenCL spec section 6.15.12.7 (#146814 ) Add corresponding clc functions, which are implemented with clang __scoped_atomic builtins. OpenCL functions are implemented as a wrapper over clc functions. Also change legacy atomic_inc and atomic_dec to re-use the newly added clc_atomic_inc/dec implementations. llvm-diff only no change to atomic_inc and atomic_dec in bitcode. Notes: * Generic OpenCL built-ins functions uses __ATOMIC_SEQ_CST and __MEMORY_SCOPE_DEVICE for memory order and memory scope parameters. * OpenCL atomic__explicit, atomic_flag built-ins are not implemented yet. * OpenCL built-ins of atomic_intptr_t, atomic_uintptr_t, atomic_size_t and atomic_ptrdiff_t types are not implemented yet. * llvm-diff shows no change to nvptx64--nvidiacl.bc and amdgcn--amdhsa.bc since __opencl_c_atomic_order_seq_cst and __opencl_c_atomic_scope_device are not defined in these two targets.	2025-07-18 08:09:14 +08:00
Wenju He	c0294f497d	[libclc] Add generic implementation of bitfield_insert/extract,bit_reverse (#149070 ) The implementation is based on reference implementation in OpenCL-CTS/test_integer_ops. The generic implementations pass OpenCL-CTS/test_integer_ops tests on Intel GPU.	2025-07-18 08:06:29 +08:00
Wenju He	3abecfe9e3	[NFC][libclc] Delete clc/include/clc/relational/floatn.inc (#149252 ) llvm-diff shows no change to amdgcn--amdhsa.bc.	2025-07-18 08:05:07 +08:00
Wenju He	cf36f49c04	[libclc] Enable `clang fp reciprocal` in clc_native_divide/recip/rsqrt/tan (#149269 ) The pragma adds `arcp` flag to `fdiv` instruction in these functions. The flag can provide better performance.	2025-07-18 07:50:35 +08:00
Wenju He	9d78eb5cc5	[libclc] Enable -fdiscard-value-names build flag to reduce bitcode size (#149016 ) The flag reduces nvptx64--nvidiacl.bc size from 10.6MB to 5.2MB.	2025-07-17 08:04:33 +08:00
Fraser Cormack	8a7a64873b	[libclc] Move CMake for prepare_builtins to a subdirectory (#148815 ) This simply makes things better self-contained.	2025-07-15 12:26:11 +01:00
Mészáros Gergely	7a089bc4c0	[libclc] Delete .gitignore (#147939 ) The file is listing build artifacts to ignore, but LLVM has long had the policy that in-tree builds are not supported, so the ignore rules shouldn't serve their original purpose anymore. The rules however are annoying because although they probably intended only to ignore top-level build artifacts, they lack the leading `/` so they match any file with the ignored name anywhere under `libclc/`.	2025-07-10 14:07:59 +02:00
Wenju He	28aa5a64ef	[libclc] Declare workitem built-ins in clc, move ptx-nvidiacl workitem built-ins into clc (#144333 ) Changes in this PR: * Declare most of workitem functions in clc and opencl folders. * Call clc workitem function in corresponding OpenCL workitem function. * Move ptx-nvidiacl workitem built-in implementations into clc. * Move a few amdgcn workitem built-in implementations into clc. * Include only needed headers in OpenCL workitem functions. * Implement get_local_linear_id, get_max_sub_group_size, get_num_sub_groups, get_sub_group_id, get_sub_group_local_id, get_sub_group_size for ptx-nvidiacl. llvm-diff shows this PR adds a few new symbols to nvptx64--nvidiacl.bc. llvm-diff shows no change to amdgcn--amdhsa.bc, nvptx--.bc and nvptx64--.bc.	2025-07-10 08:04:16 +08:00
Fraser Cormack	9b5959dd9a	[libclc] Change symlinks to copies on Windows (#147759 ) This mirrors how other LLVM libraries handle symlinks	2025-07-09 17:20:56 +01:00
Fraser Cormack	9d11bd0db8	[libclc] Remove catch-all opencl/clc.h (#147490 ) This commit finishes the work started in #146840 and #147276. It makes each OpenCL header self-contained and each implementation file include only the headers it needs. It removes the need for a catch-all include file of all OpenCL builtin declarations.	2025-07-08 10:37:06 +01:00
Fraser Cormack	b67504c461	[libclc] Tighten OpenCL builtin include strategy (#147276 ) This commit continues the work from #146840 and extends it to the maths, geomtrics, common, and relational directories. All headers have include guards and, where appropriate, include the minimal code required for their specific definitions. Implementation files no longer include the large catch-all header of all OpenCL builtin declarations.	2025-07-08 09:04:43 +01:00

1 2 3 4 5 ...

903 Commits