intel/llvm - llvm - Gitea: Git with a cup of tea

intel/llvm

mirror of https://github.com/intel/llvm.git synced 2026-01-25 10:55:58 +08:00

Author	SHA1	Message	Date
Dmitry Vyukov	0e110fb429	[libc] memmove optimizations (#70043 ) 1. Remove is_disjoint check for smaller sizes and reduce code bloat. inline_memmove may handle some small sizes as efficiently as inline_memcpy. For these sizes we may not do is_disjoint check. This both avoids additional code for the most frequent smaller sizes and removes code bloat (we don't need the memcpy logic for small sizes). Here we heavily rely on inlining and dead code elimination: from the first inline_memmove we should get only handling of small sizes, and from the second inline_memmove and inline_memcpy we should get only handling of larger sizes. 2. Use the memcpy thresholds for memmove. Memcpy thresholds were more carefully tuned. This becomes more important since we use memmove for all small sizes always now. 3. Fix boundary conditions for sizes = 16/32/64. See the added comment for explanations. Memmove function size drops from 885 to 715 bytes due to removed duplication. ``` │ baseline │ small-size │ │ sec/op │ sec/op vs base │ memmove/Google_A 3.208n ± 0% 2.911n ± 0% -9.25% (n=100) memmove/Google_B 4.113n ± 1% 3.428n ± 0% -16.65% (n=100) memmove/Google_D 5.838n ± 0% 4.158n ± 0% -28.78% (n=100) memmove/Google_S 4.712n ± 1% 3.899n ± 0% -17.25% (n=100) memmove/Google_U 3.609n ± 0% 3.247n ± 1% -10.02% (n=100) memmove/0 2.982n ± 0% 2.169n ± 0% -27.26% (n=50) memmove/1 3.253n ± 0% 2.168n ± 0% -33.34% (n=50) memmove/2 3.255n ± 0% 2.169n ± 0% -33.38% (n=50) memmove/3 3.259n ± 2% 2.175n ± 0% -33.27% (p=0.000 n=50) memmove/4 3.259n ± 0% 2.168n ± 5% -33.46% (p=0.000 n=50) memmove/5 2.488n ± 0% 1.926n ± 0% -22.57% (p=0.000 n=50) memmove/6 2.490n ± 0% 1.928n ± 0% -22.59% (p=0.000 n=50) memmove/7 2.492n ± 0% 1.927n ± 0% -22.65% (p=0.000 n=50) memmove/8 2.737n ± 0% 2.711n ± 0% -0.97% (p=0.000 n=50) memmove/9 2.736n ± 0% 2.711n ± 0% -0.94% (p=0.000 n=50) memmove/10 2.739n ± 0% 2.711n ± 0% -1.04% (p=0.000 n=50) memmove/11 2.740n ± 0% 2.711n ± 0% -1.07% (p=0.000 n=50) memmove/12 2.740n ± 0% 2.711n ± 0% -1.09% (p=0.000 n=50) memmove/13 2.744n ± 0% 2.711n ± 0% -1.22% (p=0.000 n=50) memmove/14 2.742n ± 0% 2.711n ± 0% -1.14% (p=0.000 n=50) memmove/15 2.742n ± 0% 2.711n ± 0% -1.15% (p=0.000 n=50) memmove/16 2.997n ± 0% 2.981n ± 0% -0.52% (p=0.000 n=50) memmove/17 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50) memmove/18 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50) memmove/19 2.999n ± 0% 2.982n ± 0% -0.59% (p=0.000 n=50) memmove/20 2.998n ± 0% 2.981n ± 0% -0.55% (p=0.000 n=50) memmove/21 3.000n ± 0% 2.981n ± 0% -0.61% (p=0.000 n=50) memmove/22 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50) memmove/23 3.002n ± 0% 2.981n ± 0% -0.67% (p=0.000 n=50) memmove/24 3.002n ± 0% 2.981n ± 0% -0.70% (n=50) memmove/25 3.002n ± 0% 2.981n ± 0% -0.68% (p=0.000 n=50) memmove/26 3.004n ± 0% 2.982n ± 0% -0.74% (p=0.000 n=50) memmove/27 3.005n ± 0% 2.981n ± 0% -0.79% (n=50) memmove/28 3.005n ± 0% 2.982n ± 0% -0.77% (n=50) memmove/29 3.009n ± 0% 2.981n ± 0% -0.92% (n=50) memmove/30 3.008n ± 0% 2.981n ± 0% -0.89% (n=50) memmove/31 3.007n ± 0% 2.982n ± 0% -0.86% (n=50) memmove/32 3.540n ± 0% 2.998n ± 0% -15.31% (p=0.000 n=50) memmove/33 3.544n ± 0% 2.997n ± 0% -15.44% (p=0.000 n=50) memmove/34 3.546n ± 0% 2.999n ± 0% -15.42% (n=50) memmove/35 3.545n ± 0% 2.999n ± 0% -15.40% (n=50) memmove/36 3.548n ± 0% 2.998n ± 0% -15.52% (p=0.000 n=50) memmove/37 3.546n ± 0% 3.000n ± 0% -15.41% (n=50) memmove/38 3.549n ± 0% 2.999n ± 0% -15.49% (p=0.000 n=50) memmove/39 3.549n ± 0% 2.999n ± 0% -15.48% (p=0.000 n=50) memmove/40 3.549n ± 0% 3.000n ± 0% -15.46% (p=0.000 n=50) memmove/41 3.550n ± 0% 3.001n ± 0% -15.47% (n=50) memmove/42 3.549n ± 0% 3.001n ± 0% -15.43% (n=50) memmove/43 3.552n ± 0% 3.001n ± 0% -15.52% (p=0.000 n=50) memmove/44 3.552n ± 0% 3.001n ± 0% -15.51% (n=50) memmove/45 3.552n ± 0% 3.002n ± 0% -15.48% (n=50) memmove/46 3.554n ± 0% 3.001n ± 0% -15.55% (p=0.000 n=50) memmove/47 3.556n ± 0% 3.002n ± 0% -15.58% (p=0.000 n=50) memmove/48 3.555n ± 0% 3.003n ± 0% -15.54% (n=50) memmove/49 3.557n ± 0% 3.002n ± 0% -15.59% (p=0.000 n=50) memmove/50 3.557n ± 0% 3.004n ± 0% -15.55% (p=0.000 n=50) memmove/51 3.556n ± 0% 3.004n ± 0% -15.53% (p=0.000 n=50) memmove/52 3.561n ± 0% 3.004n ± 0% -15.65% (p=0.000 n=50) memmove/53 3.558n ± 0% 3.004n ± 0% -15.57% (p=0.000 n=50) memmove/54 3.561n ± 0% 3.005n ± 0% -15.62% (n=50) memmove/55 3.560n ± 0% 3.006n ± 0% -15.57% (n=50) memmove/56 3.562n ± 0% 3.006n ± 0% -15.60% (p=0.000 n=50) memmove/57 3.563n ± 0% 3.006n ± 0% -15.64% (n=50) memmove/58 3.565n ± 0% 3.007n ± 0% -15.64% (p=0.000 n=50) memmove/59 3.564n ± 0% 3.006n ± 0% -15.66% (p=0.000 n=50) memmove/60 3.570n ± 0% 3.008n ± 0% -15.74% (p=0.000 n=50) memmove/61 3.566n ± 0% 3.009n ± 0% -15.63% (p=0.000 n=50) memmove/62 3.567n ± 0% 3.007n ± 0% -15.70% (p=0.000 n=50) memmove/63 3.568n ± 0% 3.008n ± 0% -15.71% (p=0.000 n=50) memmove/64 4.104n ± 0% 3.008n ± 0% -26.70% (p=0.000 n=50) memmove/65 4.126n ± 0% 3.662n ± 0% -11.26% (p=0.000 n=50) memmove/66 4.128n ± 0% 3.662n ± 0% -11.29% (n=50) memmove/67 4.129n ± 0% 3.662n ± 0% -11.31% (n=50) memmove/68 4.129n ± 0% 3.661n ± 0% -11.33% (p=0.000 n=50) memmove/69 4.130n ± 0% 3.662n ± 0% -11.34% (p=0.000 n=50) memmove/70 4.130n ± 0% 3.662n ± 0% -11.33% (n=50) memmove/71 4.132n ± 0% 3.662n ± 0% -11.38% (p=0.000 n=50) memmove/72 4.131n ± 0% 3.661n ± 0% -11.39% (n=50) memmove/73 4.135n ± 0% 3.661n ± 0% -11.45% (p=0.000 n=50) memmove/74 4.137n ± 0% 3.662n ± 0% -11.49% (n=50) memmove/75 4.138n ± 0% 3.662n ± 0% -11.51% (p=0.000 n=50) memmove/76 4.139n ± 0% 3.661n ± 0% -11.56% (p=0.000 n=50) memmove/77 4.136n ± 0% 3.662n ± 0% -11.47% (p=0.000 n=50) memmove/78 4.143n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50) memmove/79 4.142n ± 0% 3.661n ± 0% -11.60% (n=50) memmove/80 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50) memmove/81 4.140n ± 0% 3.661n ± 0% -11.57% (n=50) memmove/82 4.146n ± 0% 3.661n ± 0% -11.69% (n=50) memmove/83 4.143n ± 0% 3.661n ± 0% -11.63% (p=0.000 n=50) memmove/84 4.143n ± 0% 3.661n ± 0% -11.63% (n=50) memmove/85 4.147n ± 0% 3.661n ± 0% -11.73% (p=0.000 n=50) memmove/86 4.142n ± 0% 3.661n ± 0% -11.62% (p=0.000 n=50) memmove/87 4.147n ± 0% 3.661n ± 0% -11.72% (p=0.000 n=50) memmove/88 4.148n ± 0% 3.661n ± 0% -11.74% (n=50) memmove/89 4.152n ± 0% 3.661n ± 0% -11.84% (n=50) memmove/90 4.151n ± 0% 3.661n ± 0% -11.81% (n=50) memmove/91 4.150n ± 0% 3.661n ± 0% -11.78% (n=50) memmove/92 4.153n ± 0% 3.661n ± 0% -11.86% (n=50) memmove/93 4.158n ± 0% 3.661n ± 0% -11.95% (n=50) memmove/94 4.157n ± 0% 3.661n ± 0% -11.95% (p=0.000 n=50) memmove/95 4.155n ± 0% 3.661n ± 0% -11.90% (p=0.000 n=50) memmove/96 4.149n ± 0% 3.660n ± 0% -11.79% (n=50) memmove/97 4.157n ± 0% 3.661n ± 0% -11.94% (n=50) memmove/98 4.157n ± 0% 3.661n ± 0% -11.94% (n=50) memmove/99 4.168n ± 0% 3.661n ± 0% -12.17% (p=0.000 n=50) memmove/100 4.159n ± 0% 3.660n ± 0% -12.00% (p=0.000 n=50) memmove/101 4.161n ± 0% 3.660n ± 0% -12.03% (p=0.000 n=50) memmove/102 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50) memmove/103 4.164n ± 0% 3.661n ± 0% -12.08% (n=50) memmove/104 4.164n ± 0% 3.660n ± 0% -12.11% (n=50) memmove/105 4.165n ± 0% 3.660n ± 0% -12.12% (p=0.000 n=50) memmove/106 4.166n ± 0% 3.660n ± 0% -12.15% (n=50) memmove/107 4.171n ± 0% 3.660n ± 1% -12.26% (p=0.000 n=50) memmove/108 4.173n ± 0% 3.660n ± 0% -12.30% (p=0.000 n=50) memmove/109 4.170n ± 0% 3.660n ± 0% -12.24% (n=50) memmove/110 4.174n ± 0% 3.660n ± 0% -12.31% (n=50) memmove/111 4.176n ± 0% 3.660n ± 0% -12.35% (p=0.000 n=50) memmove/112 4.174n ± 0% 3.659n ± 0% -12.34% (p=0.000 n=50) memmove/113 4.176n ± 0% 3.660n ± 0% -12.35% (n=50) memmove/114 4.182n ± 0% 3.660n ± 0% -12.49% (n=50) memmove/115 4.185n ± 0% 3.660n ± 0% -12.55% (n=50) memmove/116 4.184n ± 0% 3.659n ± 0% -12.54% (n=50) memmove/117 4.182n ± 0% 3.660n ± 0% -12.50% (n=50) memmove/118 4.188n ± 0% 3.660n ± 0% -12.61% (n=50) memmove/119 4.186n ± 0% 3.660n ± 0% -12.57% (p=0.000 n=50) memmove/120 4.189n ± 0% 3.659n ± 0% -12.63% (n=50) memmove/121 4.187n ± 0% 3.660n ± 0% -12.60% (n=50) memmove/122 4.186n ± 0% 3.660n ± 0% -12.58% (n=50) memmove/123 4.187n ± 0% 3.660n ± 0% -12.60% (n=50) memmove/124 4.189n ± 0% 3.659n ± 0% -12.65% (n=50) memmove/125 4.195n ± 0% 3.659n ± 0% -12.78% (n=50) memmove/126 4.197n ± 0% 3.659n ± 0% -12.81% (n=50) memmove/127 4.194n ± 0% 3.659n ± 0% -12.75% (n=50) memmove/128 5.035n ± 0% 3.659n ± 0% -27.32% (n=50) memmove/129 5.127n ± 0% 5.164n ± 0% +0.73% (p=0.000 n=50) memmove/130 5.130n ± 0% 5.176n ± 0% +0.88% (p=0.000 n=50) memmove/131 5.127n ± 0% 5.180n ± 0% +1.05% (p=0.000 n=50) memmove/132 5.131n ± 0% 5.169n ± 0% +0.75% (p=0.000 n=50) memmove/133 5.137n ± 0% 5.179n ± 0% +0.81% (p=0.000 n=50) memmove/134 5.140n ± 0% 5.178n ± 0% +0.74% (p=0.000 n=50) memmove/135 5.141n ± 0% 5.187n ± 0% +0.88% (p=0.000 n=50) memmove/136 5.133n ± 0% 5.184n ± 0% +0.99% (p=0.000 n=50) memmove/137 5.148n ± 0% 5.186n ± 0% +0.73% (p=0.000 n=50) memmove/138 5.143n ± 0% 5.189n ± 0% +0.88% (p=0.000 n=50) memmove/139 5.142n ± 0% 5.192n ± 0% +0.97% (p=0.000 n=50) memmove/140 5.141n ± 0% 5.192n ± 0% +1.01% (p=0.000 n=50) memmove/141 5.155n ± 0% 5.188n ± 0% +0.64% (p=0.000 n=50) memmove/142 5.146n ± 0% 5.192n ± 0% +0.90% (p=0.000 n=50) memmove/143 5.142n ± 0% 5.203n ± 0% +1.19% (p=0.000 n=50) memmove/144 5.146n ± 0% 5.197n ± 0% +0.99% (p=0.000 n=50) memmove/145 5.146n ± 0% 5.196n ± 0% +0.97% (p=0.000 n=50) memmove/146 5.151n ± 0% 5.207n ± 0% +1.10% (p=0.000 n=50) memmove/147 5.151n ± 0% 5.205n ± 0% +1.06% (p=0.000 n=50) memmove/148 5.156n ± 0% 5.190n ± 0% +0.66% (p=0.000 n=50) memmove/149 5.158n ± 0% 5.212n ± 0% +1.04% (p=0.000 n=50) memmove/150 5.160n ± 0% 5.203n ± 0% +0.84% (p=0.000 n=50) memmove/151 5.167n ± 0% 5.210n ± 0% +0.83% (p=0.000 n=50) memmove/152 5.157n ± 0% 5.206n ± 0% +0.94% (p=0.000 n=50) memmove/153 5.170n ± 0% 5.211n ± 0% +0.80% (p=0.000 n=50) memmove/154 5.169n ± 0% 5.222n ± 0% +1.02% (p=0.000 n=50) memmove/155 5.171n ± 0% 5.215n ± 0% +0.87% (p=0.000 n=50) memmove/156 5.174n ± 0% 5.214n ± 0% +0.78% (p=0.000 n=50) memmove/157 5.171n ± 0% 5.218n ± 0% +0.92% (p=0.000 n=50) memmove/158 5.168n ± 0% 5.224n ± 0% +1.09% (p=0.000 n=50) memmove/159 5.179n ± 0% 5.218n ± 0% +0.76% (p=0.000 n=50) memmove/160 5.170n ± 0% 5.219n ± 0% +0.95% (p=0.000 n=50) memmove/161 5.187n ± 0% 5.220n ± 0% +0.64% (p=0.000 n=50) memmove/162 5.189n ± 0% 5.234n ± 0% +0.86% (p=0.000 n=50) memmove/163 5.199n ± 0% 5.250n ± 0% +0.99% (p=0.000 n=50) memmove/164 5.205n ± 0% 5.260n ± 0% +1.04% (p=0.000 n=50) memmove/165 5.208n ± 0% 5.261n ± 0% +1.01% (p=0.000 n=50) memmove/166 5.227n ± 0% 5.275n ± 0% +0.91% (p=0.000 n=50) memmove/167 5.233n ± 0% 5.281n ± 0% +0.92% (p=0.000 n=50) memmove/168 5.236n ± 0% 5.295n ± 0% +1.12% (p=0.000 n=50) memmove/169 5.256n ± 0% 5.297n ± 0% +0.79% (p=0.000 n=50) memmove/170 5.259n ± 0% 5.302n ± 0% +0.80% (p=0.000 n=50) memmove/171 5.269n ± 0% 5.321n ± 0% +0.97% (p=0.000 n=50) memmove/172 5.266n ± 0% 5.318n ± 0% +0.98% (p=0.000 n=50) memmove/173 5.272n ± 0% 5.330n ± 0% +1.09% (p=0.000 n=50) memmove/174 5.284n ± 0% 5.331n ± 0% +0.89% (p=0.000 n=50) memmove/175 5.284n ± 0% 5.322n ± 0% +0.72% (p=0.000 n=50) memmove/176 5.298n ± 0% 5.337n ± 0% +0.74% (p=0.000 n=50) memmove/177 5.282n ± 0% 5.338n ± 0% +1.04% (p=0.000 n=50) memmove/178 5.299n ± 0% 5.337n ± 0% +0.71% (p=0.000 n=50) memmove/179 5.296n ± 0% 5.343n ± 0% +0.88% (p=0.000 n=50) memmove/180 5.292n ± 0% 5.343n ± 0% +0.97% (p=0.000 n=50) memmove/181 5.303n ± 0% 5.335n ± 0% +0.60% (p=0.000 n=50) memmove/182 5.305n ± 0% 5.338n ± 0% +0.62% (p=0.000 n=50) memmove/183 5.298n ± 0% 5.329n ± 0% +0.59% (p=0.000 n=50) memmove/184 5.299n ± 0% 5.333n ± 0% +0.64% (p=0.000 n=50) memmove/185 5.291n ± 0% 5.330n ± 0% +0.73% (p=0.000 n=50) memmove/186 5.296n ± 0% 5.332n ± 0% +0.68% (p=0.000 n=50) memmove/187 5.297n ± 0% 5.320n ± 0% +0.44% (p=0.000 n=50) memmove/188 5.286n ± 0% 5.314n ± 0% +0.53% (p=0.000 n=50) memmove/189 5.293n ± 0% 5.318n ± 0% +0.46% (p=0.000 n=50) memmove/190 5.294n ± 0% 5.318n ± 0% +0.45% (p=0.000 n=50) memmove/191 5.292n ± 0% 5.314n ± 0% +0.40% (p=0.032 n=50) memmove/192 5.272n ± 0% 5.304n ± 0% +0.60% (p=0.000 n=50) memmove/193 5.279n ± 0% 5.310n ± 0% +0.57% (p=0.000 n=50) memmove/194 5.294n ± 0% 5.308n ± 0% +0.26% (p=0.018 n=50) memmove/195 5.302n ± 0% 5.311n ± 0% +0.18% (p=0.010 n=50) memmove/196 5.301n ± 0% 5.316n ± 0% +0.28% (p=0.023 n=50) memmove/197 5.302n ± 0% 5.327n ± 0% +0.47% (p=0.000 n=50) memmove/198 5.310n ± 0% 5.326n ± 0% +0.30% (p=0.003 n=50) memmove/199 5.303n ± 0% 5.319n ± 0% +0.30% (p=0.009 n=50) memmove/200 5.312n ± 0% 5.330n ± 0% +0.35% (p=0.001 n=50) memmove/201 5.307n ± 0% 5.333n ± 0% +0.50% (p=0.000 n=50) memmove/202 5.311n ± 0% 5.334n ± 0% +0.44% (p=0.000 n=50) memmove/203 5.313n ± 0% 5.335n ± 0% +0.41% (p=0.006 n=50) memmove/204 5.312n ± 0% 5.332n ± 0% +0.36% (p=0.002 n=50) memmove/205 5.318n ± 0% 5.345n ± 0% +0.50% (p=0.000 n=50) memmove/206 5.311n ± 0% 5.333n ± 0% +0.42% (p=0.002 n=50) memmove/207 5.310n ± 0% 5.338n ± 0% +0.52% (p=0.000 n=50) memmove/208 5.319n ± 0% 5.341n ± 0% +0.40% (p=0.004 n=50) memmove/209 5.330n ± 0% 5.346n ± 0% +0.30% (p=0.004 n=50) memmove/210 5.329n ± 0% 5.349n ± 0% +0.38% (p=0.002 n=50) memmove/211 5.318n ± 0% 5.340n ± 0% +0.41% (p=0.000 n=50) memmove/212 5.339n ± 0% 5.343n ± 0% ~ (p=0.396 n=50) memmove/213 5.329n ± 0% 5.343n ± 0% +0.25% (p=0.017 n=50) memmove/214 5.339n ± 0% 5.358n ± 0% +0.35% (p=0.035 n=50) memmove/215 5.342n ± 0% 5.346n ± 0% ~ (p=0.063 n=50) memmove/216 5.338n ± 0% 5.359n ± 0% +0.39% (p=0.002 n=50) memmove/217 5.341n ± 0% 5.362n ± 0% +0.39% (p=0.015 n=50) memmove/218 5.354n ± 0% 5.373n ± 0% +0.36% (p=0.041 n=50) memmove/219 5.352n ± 0% 5.362n ± 0% ~ (p=0.143 n=50) memmove/220 5.344n ± 0% 5.370n ± 0% +0.50% (p=0.001 n=50) memmove/221 5.345n ± 0% 5.373n ± 0% +0.53% (p=0.000 n=50) memmove/222 5.348n ± 0% 5.360n ± 0% +0.23% (p=0.014 n=50) memmove/223 5.354n ± 0% 5.377n ± 0% +0.43% (p=0.024 n=50) memmove/224 5.352n ± 0% 5.363n ± 0% ~ (p=0.052 n=50) memmove/225 5.372n ± 0% 5.380n ± 0% ~ (p=0.481 n=50) memmove/226 5.368n ± 0% 5.386n ± 0% +0.34% (p=0.004 n=50) memmove/227 5.386n ± 0% 5.402n ± 0% +0.29% (p=0.028 n=50) memmove/228 5.400n ± 0% 5.408n ± 0% ~ (p=0.174 n=50) memmove/229 5.423n ± 0% 5.427n ± 0% ~ (p=0.444 n=50) memmove/230 5.411n ± 0% 5.429n ± 0% +0.33% (p=0.020 n=50) memmove/231 5.420n ± 0% 5.433n ± 0% +0.24% (p=0.034 n=50) memmove/232 5.435n ± 0% 5.441n ± 0% ~ (p=0.235 n=50) memmove/233 5.446n ± 0% 5.462n ± 0% ~ (p=0.590 n=50) memmove/234 5.467n ± 0% 5.461n ± 0% ~ (p=0.921 n=50) memmove/235 5.472n ± 0% 5.478n ± 0% ~ (p=0.883 n=50) memmove/236 5.466n ± 0% 5.478n ± 0% ~ (p=0.324 n=50) memmove/237 5.471n ± 0% 5.489n ± 0% ~ (p=0.132 n=50) memmove/238 5.485n ± 0% 5.489n ± 0% ~ (p=0.460 n=50) memmove/239 5.484n ± 0% 5.488n ± 0% ~ (p=0.833 n=50) memmove/240 5.483n ± 0% 5.495n ± 0% ~ (p=0.095 n=50) memmove/241 5.498n ± 0% 5.514n ± 0% ~ (p=0.077 n=50) memmove/242 5.518n ± 0% 5.517n ± 0% ~ (p=0.481 n=50) memmove/243 5.514n ± 0% 5.511n ± 0% ~ (p=0.503 n=50) memmove/244 5.510n ± 0% 5.497n ± 0% -0.24% (p=0.038 n=50) memmove/245 5.516n ± 0% 5.505n ± 0% ~ (p=0.317 n=50) memmove/246 5.513n ± 1% 5.494n ± 0% ~ (p=0.147 n=50) memmove/247 5.518n ± 0% 5.499n ± 0% -0.36% (p=0.011 n=50) memmove/248 5.503n ± 0% 5.492n ± 0% ~ (p=0.267 n=50) memmove/249 5.498n ± 0% 5.497n ± 0% ~ (p=0.765 n=50) memmove/250 5.485n ± 0% 5.493n ± 0% ~ (p=0.348 n=50) memmove/251 5.503n ± 0% 5.482n ± 0% -0.37% (p=0.013 n=50) memmove/252 5.497n ± 0% 5.485n ± 0% ~ (p=0.077 n=50) memmove/253 5.489n ± 0% 5.496n ± 0% ~ (p=0.850 n=50) memmove/254 5.497n ± 0% 5.491n ± 0% ~ (p=0.548 n=50) memmove/255 5.484n ± 1% 5.494n ± 0% ~ (p=0.888 n=50) memmove/256 6.952n ± 0% 7.676n ± 0% +10.41% (p=0.000 n=50) geomean 4.406n 4.127n -6.33% ```	2023-10-26 13:40:25 +02:00
Dmitry Vyukov	605fadf0ca	[libc] Add --sweep-min-size flag for benchmarks (#70302 ) We have --sweep-max-size, it's reasonable to have --sweep-min-size as well. It can be used when working on the logic for larger sizes, or to collect a profile for larger sizes only.	2023-10-26 11:06:15 +02:00
Joseph Huber	e3d2a7d0a5	[libc] Compile the GPU functions with '-fconvergent-functions' (#70229 ) Summary: This patch simply adds the `-fconvergent-functions` flag to the GPU compilation. This is in relation to the behaviour of SIMT architectures under divergence. With the flag, we assume every function is convergent by default and rely on the compiler's divergence analysis to transform it if possible. Fixes: https://github.com/llvm/llvm-project/issues/63853	2023-10-25 14:13:21 -05:00
Benjamin Kramer	c4e9a43773	[libc] Fix a constexpr violation from `b4e552999d` In msan mode this calls __msan_unpoison, which isn't constexpr.	2023-10-25 13:36:17 +02:00
michaelrj-google	2282af26ea	[libc] Disable -NaN test on float128 systems (#70146 ) Some float128 systems (specifically the ones used for aarch64 buildbots) don't respect signs for long double NaNs. This patch disables the printf test that was failing due to this.	2023-10-24 16:45:54 -07:00
michaelrj-google	b4e552999d	[libc] Fix printf long double inf, bitcast in msan (#70067 ) These bugs were found with the new printf long double fuzzing. The long double inf vs nan bug was introduced when we changed to get_explicit_exponent. The bitcast msan issue hadn't come up previously, but isn't a real bug, just a poisoning confusion.	2023-10-24 15:41:54 -07:00
Dmitry Vyukov	f364a7a8b4	[libc] Speed up memmove overlapping check (#70017 ) Use a check that requries fewer instructions and cheaper. Current code: ``` 1b704: 48 39 f7 cmp %rsi,%rdi 1b707: 48 89 f0 mov %rsi,%rax 1b70a: 48 0f 47 c7 cmova %rdi,%rax 1b70e: 48 89 f9 mov %rdi,%rcx 1b711: 48 0f 47 ce cmova %rsi,%rcx 1b715: 48 01 d1 add %rdx,%rcx 1b718: 48 39 c1 cmp %rax,%rcx ``` New code: ``` 1b704: 48 89 f8 mov %rdi,%rax 1b707: 48 29 f0 sub %rsi,%rax 1b70a: 48 89 c1 mov %rax,%rcx 1b70d: 48 f7 d9 neg %rcx 1b710: 48 0f 48 c8 cmovs %rax,%rcx 1b714: 48 39 d1 cmp %rdx,%rcx ``` ``` │ baseline │ disjoint │ │ sec/op │ sec/op vs base │ memmove/Google_A 3.910n ± 0% 3.861n ± 1% -1.26% (p=0.000 n=50) ``` ``` │ baseline │ disjoint │ │ sec/op │ sec/op vs base │ memmove/1 2.724n ± 3% 2.441n ± 0% -10.37% (n=50) memmove/2 2.878n ± 0% 2.713n ± 0% -5.73% (n=50) memmove/3 2.835n ± 0% 2.593n ± 0% -8.54% (n=50) memmove/4 3.032n ± 0% 2.776n ± 0% -8.45% (p=0.000 n=50) memmove/5 2.833n ± 0% 2.600n ± 0% -8.20% (p=0.000 n=50) memmove/6 2.758n ± 0% 2.744n ± 0% -0.52% (p=0.000 n=50) memmove/7 2.762n ± 0% 2.744n ± 0% -0.63% (p=0.000 n=50) memmove/8 2.763n ± 0% 2.750n ± 0% -0.46% (p=0.000 n=50) memmove/9 3.182n ± 0% 3.269n ± 0% +2.75% (p=0.000 n=50) memmove/10 3.185n ± 0% 3.270n ± 0% +2.64% (p=0.000 n=50) memmove/11 3.188n ± 0% 3.277n ± 0% +2.79% (p=0.000 n=50) memmove/12 3.190n ± 0% 3.279n ± 0% +2.82% (p=0.000 n=50) memmove/13 3.194n ± 0% 3.281n ± 0% +2.73% (p=0.000 n=50) memmove/14 3.197n ± 0% 3.285n ± 0% +2.77% (p=0.000 n=50) memmove/15 3.198n ± 0% 3.282n ± 0% +2.62% (p=0.000 n=50) memmove/16 3.201n ± 0% 3.284n ± 0% +2.61% (p=0.000 n=50) memmove/17 3.564n ± 0% 3.320n ± 0% -6.86% (p=0.000 n=50) memmove/18 3.572n ± 0% 3.313n ± 0% -7.25% (p=0.000 n=50) memmove/19 3.572n ± 0% 3.325n ± 0% -6.94% (p=0.000 n=50) memmove/20 3.575n ± 0% 3.319n ± 0% -7.15% (p=0.000 n=50) memmove/21 3.578n ± 0% 3.327n ± 0% -7.03% (p=0.000 n=50) memmove/22 3.581n ± 0% 3.330n ± 0% -7.01% (p=0.000 n=50) memmove/23 3.582n ± 0% 3.354n ± 1% -6.37% (p=0.000 n=50) memmove/24 3.587n ± 0% 3.347n ± 1% -6.71% (p=0.000 n=50) memmove/25 3.591n ± 0% 3.320n ± 0% -7.55% (p=0.000 n=50) memmove/26 3.593n ± 0% 3.348n ± 0% -6.82% (p=0.000 n=50) memmove/27 3.596n ± 0% 3.346n ± 0% -6.94% (p=0.000 n=50) memmove/28 3.597n ± 0% 3.357n ± 0% -6.67% (p=0.000 n=50) memmove/29 3.601n ± 0% 3.340n ± 0% -7.23% (p=0.000 n=50) memmove/30 3.602n ± 0% 3.345n ± 0% -7.12% (p=0.000 n=50) memmove/31 3.608n ± 0% 3.357n ± 0% -6.94% (p=0.000 n=50) memmove/32 3.605n ± 0% 3.352n ± 0% -7.01% (p=0.000 n=50) memmove/33 4.128n ± 1% 3.829n ± 0% -7.23% (p=0.000 n=50) memmove/34 4.149n ± 0% 3.836n ± 0% -7.54% (p=0.000 n=50) memmove/35 4.134n ± 0% 3.839n ± 0% -7.15% (n=50) memmove/36 4.151n ± 0% 3.842n ± 0% -7.45% (n=50) memmove/37 4.152n ± 0% 3.841n ± 0% -7.49% (p=0.000 n=50) memmove/38 4.159n ± 0% 3.844n ± 0% -7.58% (p=0.000 n=50) memmove/39 4.165n ± 0% 3.841n ± 0% -7.78% (p=0.000 n=50) memmove/40 4.162n ± 0% 3.837n ± 0% -7.81% (p=0.000 n=50) memmove/41 4.161n ± 0% 3.845n ± 0% -7.58% (p=0.000 n=50) memmove/42 4.164n ± 0% 3.851n ± 0% -7.53% (p=0.000 n=50) memmove/43 4.165n ± 0% 3.843n ± 0% -7.74% (p=0.000 n=50) memmove/44 4.175n ± 0% 3.847n ± 0% -7.83% (p=0.000 n=50) memmove/45 4.170n ± 0% 3.849n ± 0% -7.70% (p=0.000 n=50) memmove/46 4.175n ± 0% 3.850n ± 0% -7.79% (p=0.000 n=50) memmove/47 4.180n ± 0% 3.851n ± 0% -7.87% (p=0.000 n=50) memmove/48 4.178n ± 0% 3.852n ± 0% -7.81% (p=0.000 n=50) memmove/49 4.175n ± 0% 3.851n ± 0% -7.76% (n=50) memmove/50 4.178n ± 0% 3.855n ± 0% -7.73% (p=0.000 n=50) memmove/51 4.190n ± 0% 3.859n ± 0% -7.91% (p=0.000 n=50) memmove/52 4.188n ± 0% 3.859n ± 0% -7.84% (p=0.000 n=50) memmove/53 4.191n ± 0% 3.863n ± 0% -7.82% (p=0.000 n=50) memmove/54 4.192n ± 0% 3.860n ± 0% -7.91% (p=0.000 n=50) memmove/55 4.192n ± 0% 3.869n ± 0% -7.70% (p=0.000 n=50) memmove/56 4.204n ± 0% 3.866n ± 0% -8.05% (p=0.000 n=50) memmove/57 4.198n ± 0% 3.864n ± 0% -7.95% (p=0.000 n=50) memmove/58 4.202n ± 0% 3.865n ± 0% -8.02% (p=0.000 n=50) memmove/59 4.208n ± 0% 3.868n ± 0% -8.09% (p=0.000 n=50) memmove/60 4.205n ± 0% 3.873n ± 0% -7.89% (p=0.000 n=50) memmove/61 4.212n ± 0% 3.872n ± 0% -8.08% (p=0.000 n=50) memmove/62 4.214n ± 0% 3.870n ± 0% -8.16% (p=0.000 n=50) memmove/63 4.215n ± 0% 3.877n ± 0% -8.02% (p=0.000 n=50) memmove/64 4.217n ± 0% 3.881n ± 0% -7.99% (p=0.000 n=50) memmove/65 4.990n ± 0% 4.683n ± 0% -6.15% (p=0.000 n=50) memmove/66 5.022n ± 0% 4.719n ± 0% -6.03% (p=0.000 n=50) memmove/67 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50) memmove/68 5.035n ± 0% 4.724n ± 0% -6.18% (p=0.000 n=50) memmove/69 5.030n ± 0% 4.725n ± 0% -6.07% (p=0.000 n=50) memmove/70 5.040n ± 0% 4.728n ± 0% -6.19% (p=0.000 n=50) memmove/71 5.053n ± 0% 4.728n ± 0% -6.43% (p=0.000 n=50) memmove/72 5.050n ± 0% 4.732n ± 0% -6.29% (p=0.000 n=50) memmove/73 5.049n ± 0% 4.733n ± 0% -6.24% (p=0.000 n=50) memmove/74 5.054n ± 0% 4.734n ± 0% -6.34% (p=0.000 n=50) memmove/75 5.063n ± 0% 4.736n ± 0% -6.46% (p=0.000 n=50) memmove/76 5.046n ± 0% 4.741n ± 0% -6.04% (p=0.000 n=50) memmove/77 5.057n ± 0% 4.741n ± 0% -6.25% (p=0.000 n=50) memmove/78 5.077n ± 0% 4.739n ± 0% -6.65% (p=0.000 n=50) memmove/79 5.074n ± 0% 4.746n ± 0% -6.46% (p=0.000 n=50) memmove/80 5.085n ± 0% 4.747n ± 0% -6.65% (p=0.000 n=50) memmove/81 5.077n ± 0% 4.735n ± 0% -6.74% (p=0.000 n=50) memmove/82 5.087n ± 0% 4.747n ± 0% -6.68% (p=0.000 n=50) memmove/83 5.087n ± 0% 4.754n ± 0% -6.56% (p=0.000 n=50) memmove/84 5.096n ± 0% 4.753n ± 0% -6.73% (p=0.000 n=50) memmove/85 5.082n ± 0% 4.749n ± 0% -6.55% (p=0.000 n=50) memmove/86 5.103n ± 0% 4.752n ± 0% -6.87% (p=0.000 n=50) memmove/87 5.096n ± 0% 4.760n ± 0% -6.61% (p=0.000 n=50) memmove/88 5.099n ± 0% 4.765n ± 0% -6.55% (p=0.000 n=50) memmove/89 5.104n ± 0% 4.757n ± 0% -6.79% (p=0.000 n=50) memmove/90 5.117n ± 0% 4.767n ± 0% -6.84% (p=0.000 n=50) memmove/91 5.100n ± 0% 4.766n ± 0% -6.54% (p=0.000 n=50) memmove/92 5.103n ± 0% 4.763n ± 0% -6.67% (p=0.000 n=50) memmove/93 5.115n ± 0% 4.772n ± 0% -6.71% (p=0.000 n=50) memmove/94 5.117n ± 0% 4.769n ± 0% -6.80% (p=0.000 n=50) memmove/95 5.131n ± 0% 4.775n ± 0% -6.94% (p=0.000 n=50) memmove/96 5.129n ± 0% 4.772n ± 0% -6.97% (p=0.000 n=50) memmove/97 5.130n ± 0% 4.764n ± 0% -7.13% (p=0.000 n=50) memmove/98 5.134n ± 0% 4.780n ± 0% -6.89% (p=0.000 n=50) memmove/99 5.141n ± 0% 4.780n ± 0% -7.03% (p=0.000 n=50) memmove/100 5.141n ± 0% 4.780n ± 0% -7.02% (p=0.000 n=50) memmove/101 5.150n ± 0% 4.782n ± 0% -7.14% (p=0.000 n=50) memmove/102 5.150n ± 0% 4.790n ± 0% -6.99% (p=0.000 n=50) memmove/103 5.156n ± 0% 4.788n ± 0% -7.14% (n=50) memmove/104 5.157n ± 0% 4.793n ± 0% -7.05% (p=0.000 n=50) memmove/105 5.147n ± 0% 4.791n ± 0% -6.90% (p=0.000 n=50) memmove/106 5.167n ± 0% 4.793n ± 0% -7.23% (p=0.000 n=50) memmove/107 5.165n ± 0% 4.801n ± 0% -7.06% (p=0.000 n=50) memmove/108 5.173n ± 0% 4.800n ± 0% -7.21% (p=0.000 n=50) memmove/109 5.173n ± 0% 4.797n ± 0% -7.27% (p=0.000 n=50) memmove/110 5.171n ± 0% 4.808n ± 0% -7.01% (p=0.000 n=50) memmove/111 5.180n ± 0% 4.799n ± 0% -7.36% (p=0.000 n=50) memmove/112 5.185n ± 0% 4.812n ± 0% -7.19% (p=0.000 n=50) memmove/113 5.187n ± 0% 4.797n ± 0% -7.53% (p=0.000 n=50) memmove/114 5.183n ± 0% 4.809n ± 0% -7.21% (n=50) memmove/115 5.193n ± 0% 4.811n ± 0% -7.36% (p=0.000 n=50) memmove/116 5.196n ± 0% 4.815n ± 0% -7.32% (p=0.000 n=50) memmove/117 5.199n ± 0% 4.816n ± 0% -7.37% (p=0.000 n=50) memmove/118 5.198n ± 0% 4.811n ± 0% -7.45% (p=0.000 n=50) memmove/119 5.203n ± 0% 4.818n ± 0% -7.40% (p=0.000 n=50) memmove/120 5.195n ± 0% 4.823n ± 0% -7.16% (p=0.000 n=50) memmove/121 5.203n ± 0% 4.812n ± 0% -7.51% (p=0.000 n=50) memmove/122 5.204n ± 0% 4.818n ± 0% -7.42% (n=50) memmove/123 5.202n ± 0% 4.822n ± 0% -7.31% (p=0.000 n=50) memmove/124 5.216n ± 0% 4.823n ± 0% -7.54% (p=0.000 n=50) memmove/125 5.227n ± 0% 4.823n ± 0% -7.72% (p=0.000 n=50) memmove/126 5.235n ± 0% 4.830n ± 0% -7.74% (p=0.000 n=50) memmove/127 5.237n ± 0% 4.833n ± 0% -7.72% (p=0.000 n=50) memmove/128 5.241n ± 0% 4.832n ± 0% -7.81% (p=0.000 n=50) memmove/129 6.460n ± 0% 5.858n ± 0% -9.31% (p=0.000 n=50) memmove/130 7.539n ± 0% 6.634n ± 0% -12.00% (p=0.000 n=50) memmove/131 7.542n ± 0% 6.623n ± 0% -12.18% (p=0.000 n=50) memmove/132 7.527n ± 0% 6.667n ± 1% -11.43% (p=0.000 n=50) memmove/133 7.521n ± 0% 6.631n ± 0% -11.83% (p=0.000 n=50) memmove/134 7.531n ± 0% 6.642n ± 0% -11.81% (p=0.000 n=50) memmove/135 7.541n ± 0% 6.692n ± 1% -11.25% (p=0.000 n=50) memmove/136 7.549n ± 0% 6.657n ± 0% -11.81% (p=0.000 n=50) memmove/137 7.544n ± 0% 6.646n ± 0% -11.90% (p=0.000 n=50) memmove/138 7.557n ± 0% 6.673n ± 1% -11.70% (p=0.000 n=50) memmove/139 7.545n ± 0% 6.654n ± 0% -11.81% (n=50) memmove/140 7.559n ± 0% 6.680n ± 1% -11.63% (p=0.000 n=50) memmove/141 7.560n ± 0% 6.664n ± 0% -11.85% (p=0.000 n=50) memmove/142 7.556n ± 0% 6.679n ± 0% -11.62% (p=0.000 n=50) memmove/143 7.570n ± 0% 6.683n ± 1% -11.71% (p=0.000 n=50) memmove/144 7.586n ± 0% 6.683n ± 0% -11.91% (p=0.000 n=50) memmove/145 7.593n ± 0% 6.665n ± 0% -12.22% (p=0.000 n=50) memmove/146 7.591n ± 0% 6.665n ± 0% -12.20% (p=0.000 n=50) memmove/147 7.598n ± 0% 6.665n ± 0% -12.27% (p=0.000 n=50) memmove/148 7.598n ± 0% 6.670n ± 0% -12.21% (p=0.000 n=50) memmove/149 7.593n ± 0% 6.691n ± 0% -11.88% (p=0.000 n=50) memmove/150 7.625n ± 0% 6.713n ± 1% -11.97% (p=0.000 n=50) memmove/151 7.603n ± 0% 6.710n ± 1% -11.74% (p=0.000 n=50) memmove/152 7.613n ± 0% 6.701n ± 1% -11.97% (p=0.000 n=50) memmove/153 7.595n ± 0% 6.710n ± 0% -11.65% (p=0.000 n=50) memmove/154 7.614n ± 0% 6.721n ± 0% -11.74% (p=0.000 n=50) memmove/155 7.615n ± 0% 6.709n ± 0% -11.89% (p=0.000 n=50) memmove/156 7.613n ± 0% 6.693n ± 0% -12.08% (p=0.000 n=50) memmove/157 7.628n ± 0% 6.708n ± 0% -12.05% (p=0.000 n=50) memmove/158 7.629n ± 0% 6.706n ± 0% -12.10% (p=0.000 n=50) memmove/159 7.639n ± 0% 6.724n ± 0% -11.98% (p=0.000 n=50) memmove/160 7.619n ± 0% 6.702n ± 0% -12.04% (p=0.000 n=50) memmove/161 7.653n ± 0% 6.698n ± 0% -12.49% (p=0.000 n=50) memmove/162 8.104n ± 0% 7.140n ± 1% -11.89% (p=0.000 n=50) memmove/163 8.141n ± 0% 7.187n ± 1% -11.72% (p=0.000 n=50) memmove/164 8.154n ± 0% 7.107n ± 0% -12.84% (p=0.000 n=50) memmove/165 8.143n ± 0% 7.117n ± 0% -12.59% (p=0.000 n=50) memmove/166 8.176n ± 0% 7.110n ± 0% -13.04% (p=0.000 n=50) memmove/167 8.194n ± 0% 7.168n ± 1% -12.52% (p=0.000 n=50) memmove/168 8.214n ± 0% 7.188n ± 1% -12.50% (p=0.000 n=50) memmove/169 8.220n ± 0% 7.242n ± 1% -11.90% (p=0.000 n=50) memmove/170 8.228n ± 0% 7.244n ± 1% -11.96% (p=0.000 n=50) memmove/171 8.263n ± 0% 7.184n ± 0% -13.06% (p=0.000 n=50) memmove/172 8.259n ± 0% 7.325n ± 1% -11.31% (p=0.000 n=50) memmove/173 8.271n ± 0% 7.225n ± 0% -12.65% (p=0.000 n=50) memmove/174 8.284n ± 0% 7.287n ± 1% -12.04% (p=0.000 n=50) memmove/175 8.289n ± 0% 7.282n ± 1% -12.15% (p=0.000 n=50) memmove/176 8.309n ± 0% 7.328n ± 1% -11.81% (p=0.000 n=50) memmove/177 8.317n ± 0% 7.264n ± 1% -12.67% (p=0.000 n=50) memmove/178 8.302n ± 0% 7.342n ± 1% -11.57% (p=0.000 n=50) memmove/179 8.309n ± 0% 7.357n ± 1% -11.45% (p=0.000 n=50) memmove/180 8.304n ± 0% 7.318n ± 1% -11.87% (p=0.000 n=50) memmove/181 8.312n ± 0% 7.363n ± 1% -11.42% (p=0.000 n=50) memmove/182 8.315n ± 0% 7.320n ± 1% -11.96% (p=0.000 n=50) memmove/183 8.330n ± 0% 7.286n ± 1% -12.53% (p=0.000 n=50) memmove/184 8.310n ± 0% 7.324n ± 1% -11.86% (p=0.000 n=50) memmove/185 8.303n ± 0% 7.267n ± 1% -12.47% (p=0.000 n=50) memmove/186 8.287n ± 0% 7.312n ± 1% -11.76% (p=0.000 n=50) memmove/187 8.298n ± 0% 7.395n ± 2% -10.88% (p=0.000 n=50) memmove/188 8.296n ± 0% 7.339n ± 1% -11.54% (p=0.000 n=50) memmove/189 8.306n ± 0% 7.299n ± 1% -12.12% (p=0.000 n=50) memmove/190 8.281n ± 0% 7.309n ± 1% -11.74% (p=0.000 n=50) memmove/191 8.299n ± 0% 7.282n ± 1% -12.26% (p=0.000 n=50) memmove/192 8.281n ± 0% 7.335n ± 1% -11.41% (p=0.000 n=50) memmove/193 8.299n ± 0% 7.325n ± 1% -11.74% (p=0.000 n=50) memmove/194 8.641n ± 0% 8.034n ± 0% -7.02% (p=0.000 n=50) memmove/195 8.667n ± 0% 8.073n ± 0% -6.85% (p=0.000 n=50) memmove/196 8.666n ± 0% 8.030n ± 0% -7.34% (p=0.000 n=50) memmove/197 8.660n ± 0% 8.096n ± 1% -6.51% (p=0.000 n=50) memmove/198 8.688n ± 0% 8.047n ± 0% -7.39% (p=0.000 n=50) memmove/199 8.678n ± 0% 8.061n ± 0% -7.11% (p=0.000 n=50) memmove/200 8.669n ± 0% 8.034n ± 0% -7.32% (p=0.000 n=50) memmove/201 8.692n ± 0% 8.061n ± 0% -7.26% (p=0.000 n=50) memmove/202 8.668n ± 0% 8.060n ± 0% -7.02% (p=0.000 n=50) memmove/203 8.687n ± 0% 8.066n ± 0% -7.15% (p=0.000 n=50) memmove/204 8.699n ± 0% 8.076n ± 0% -7.16% (p=0.000 n=50) memmove/205 8.676n ± 0% 8.085n ± 0% -6.82% (p=0.000 n=50) memmove/206 8.684n ± 0% 8.101n ± 1% -6.71% (p=0.000 n=50) memmove/207 8.725n ± 0% 8.099n ± 0% -7.18% (p=0.000 n=50) memmove/208 8.674n ± 0% 8.073n ± 0% -6.92% (p=0.000 n=50) memmove/209 8.697n ± 0% 8.088n ± 0% -7.01% (p=0.000 n=50) memmove/210 8.733n ± 0% 8.076n ± 0% -7.53% (p=0.000 n=50) memmove/211 8.732n ± 0% 8.104n ± 0% -7.19% (p=0.000 n=50) memmove/212 8.730n ± 0% 8.091n ± 0% -7.32% (p=0.000 n=50) memmove/213 8.728n ± 0% 8.100n ± 0% -7.19% (p=0.000 n=50) memmove/214 8.744n ± 1% 8.081n ± 1% -7.57% (p=0.000 n=50) memmove/215 8.734n ± 0% 8.150n ± 0% -6.68% (p=0.000 n=50) memmove/216 8.748n ± 0% 8.116n ± 0% -7.23% (p=0.000 n=50) memmove/217 8.751n ± 0% 8.129n ± 1% -7.11% (p=0.000 n=50) memmove/218 8.747n ± 0% 8.114n ± 0% -7.23% (p=0.000 n=50) memmove/219 8.733n ± 0% 8.159n ± 0% -6.57% (p=0.000 n=50) memmove/220 8.764n ± 0% 8.145n ± 0% -7.06% (p=0.000 n=50) memmove/221 8.764n ± 0% 8.142n ± 0% -7.10% (p=0.000 n=50) memmove/222 8.775n ± 0% 8.152n ± 0% -7.10% (p=0.000 n=50) memmove/223 8.771n ± 0% 8.143n ± 0% -7.16% (p=0.000 n=50) memmove/224 8.778n ± 0% 8.175n ± 1% -6.87% (p=0.000 n=50) memmove/225 8.794n ± 0% 8.138n ± 0% -7.45% (p=0.000 n=50) memmove/226 10.13n ± 0% 10.06n ± 0% -0.71% (p=0.000 n=50) memmove/227 10.14n ± 0% 10.08n ± 0% -0.53% (p=0.000 n=50) memmove/228 10.13n ± 0% 10.08n ± 0% -0.56% (p=0.000 n=50) memmove/229 10.17n ± 0% 10.11n ± 0% -0.56% (p=0.000 n=50) memmove/230 10.17n ± 0% 10.13n ± 0% -0.38% (p=0.003 n=50) memmove/231 10.16n ± 0% 10.12n ± 0% -0.41% (p=0.001 n=50) memmove/232 10.19n ± 0% 10.12n ± 0% -0.67% (p=0.000 n=50) memmove/233 10.21n ± 0% 10.14n ± 0% -0.71% (p=0.000 n=50) memmove/234 10.24n ± 0% 10.16n ± 0% -0.79% (p=0.000 n=50) memmove/235 10.24n ± 0% 10.16n ± 0% -0.76% (p=0.000 n=50) memmove/236 10.25n ± 0% 10.16n ± 0% -0.81% (p=0.000 n=50) memmove/237 10.24n ± 0% 10.17n ± 0% -0.69% (p=0.000 n=50) memmove/238 10.27n ± 0% 10.19n ± 0% -0.79% (p=0.000 n=50) memmove/239 10.29n ± 0% 10.19n ± 0% -0.90% (p=0.000 n=50) memmove/240 10.30n ± 0% 10.20n ± 0% -0.95% (p=0.000 n=50) memmove/241 10.29n ± 0% 10.20n ± 0% -0.91% (p=0.000 n=50) memmove/242 10.30n ± 0% 10.22n ± 0% -0.80% (p=0.000 n=50) memmove/243 10.32n ± 0% 10.23n ± 0% -0.87% (p=0.000 n=50) memmove/244 10.32n ± 0% 10.24n ± 0% -0.74% (p=0.000 n=50) memmove/245 10.33n ± 0% 10.23n ± 0% -0.97% (p=0.000 n=50) memmove/246 10.33n ± 0% 10.24n ± 0% -0.92% (p=0.000 n=50) memmove/247 10.31n ± 0% 10.24n ± 0% -0.69% (p=0.000 n=50) memmove/248 10.32n ± 0% 10.26n ± 0% -0.55% (p=0.000 n=50) memmove/249 10.33n ± 0% 10.28n ± 0% -0.52% (p=0.000 n=50) memmove/250 10.34n ± 0% 10.27n ± 0% -0.66% (p=0.000 n=50) memmove/251 10.32n ± 0% 10.27n ± 0% -0.45% (p=0.000 n=50) memmove/252 10.34n ± 0% 10.30n ± 0% -0.39% (p=0.005 n=50) memmove/253 10.33n ± 0% 10.27n ± 0% -0.57% (p=0.000 n=50) memmove/254 10.33n ± 0% 10.27n ± 0% -0.54% (p=0.000 n=50) memmove/255 10.34n ± 0% 10.29n ± 0% -0.50% (p=0.002 n=50) memmove/256 10.36n ± 0% 10.31n ± 0% -0.44% (p=0.006 n=50) memmove/257 10.33n ± 0% 10.29n ± 0% -0.36% (p=0.004 n=50) geomean 6.142n 5.696n -7.26% ```	2023-10-24 16:05:27 +02:00
Joseph Huber	25bf1ae99b	[libc] Enable remaining string functions on the GPU (#68346 ) Summary: We previously had to disable these string functions because they were not compatible with the definitions coming from the GNU / host environment. The GPU, when exporting its declarations, has a very difficult requirement that it be compatible with the host environment as both sides of the compilation need to agree on definitions and what's present. This patch more or less gives up an just copies the definitions as expected by `glibc` if they are provided that way, otherwise we fall back to the accepted way. This is the alternative solution to an existing PR which instead disable's GCC's handling.	2023-10-23 13:16:20 -04:00
Hans Wennborg	e2fc68c3db	Typos: 'maxium', 'minium'	2023-10-23 10:42:28 +02:00
Anton Rydahl	e774482c4c	Fixed typo in GPU libm device library warning (#69752 ) Correcting a small typo in the error message when the CUDA device libraries are not detected.	2023-10-20 12:17:26 -07:00
lntue	6d53fdeab4	[libc][NFC] Attempt to deflake gettimeofday_test. (#69719 ) Only check if gettimeofday call succeeds.	2023-10-20 11:08:01 -04:00
lntue	ec10c36b07	[libc][NFC] Forcing data type in gettimeofday_test when comparing the diff. (#69652 )	2023-10-19 19:49:59 -04:00
Joseph Huber	630037ede4	[libc] Partially implement 'rand' for the GPU (#66167 ) Summary: This patch partially implements the `rand` function on the GPU. This is partial because the GPU currently doesn't support thread local storage or static initializers. To implement this on the GPU. I use 1/8th of the local / shared memory quota to treak the shared memory as thread local storage. This is done by simply allocating enough storage for each thread in the block and indexing into this based off of the thread id. The downside to this is that it does not initialize `srand` correctly to be `1` as the standard says, it is also wasteful. In the future we should figure out a way to support TLS on the GPU so that this can be completely common and less resource intensive.	2023-10-19 17:01:43 -04:00
Joseph Huber	a39215768b	[libc] Rework the 'fgets' implementation on the GPU (#69635 ) Summary: The `fgets` function as implemented is not functional currently when called with multiple threads. This is because we rely on reapeatedly polling the character to detect EOF. This doesn't work when there are multiple threads that may with to poll the characters. this patch pulls out the logic into a standalone RPC call to handle this in a single operation such that calling it from multiple threads functions as expected. It also makes it less slow because we no longer make N RPC calls for N characters.	2023-10-19 17:00:01 -04:00
Anton Rydahl	c73ad025b1	[libc][libm][GPU] Add missing vendor entrypoints to the GPU version of `libm` (#66034 ) This patch populates the GPU version of `libm` with missing vendor entrypoints. The vendor math entrypoints are disabled by default but can be enabled with the CMake option `LIBC_GPU_VENDOR_MATH=ON`.	2023-10-19 12:24:50 -07:00
Alfred Persson Forsberg	67770cbb98	[libc][NFC] Fix features.h.def file header	2023-10-19 20:00:26 +02:00
alfredfo	f350532099	[libc] Fix accidental LIBC_NAMESPACE_clock_freq (#69620 ) See-also: https://github.com/llvm/llvm-project/pull/69548	2023-10-19 19:39:02 +02:00
lntue	3fd5113cba	[libc][math][NFC] Remove global scope constants declaration in math tests (#69558 ) Clean up usage of `DECLARE_SPECIAL_CONSTANTS` in global scope.	2023-10-19 10:30:11 -04:00
alfredfo	d404130134	[libc] Fix accidental LIBC_NAMESPACE_syscall definition (#69548 ) Building helloworld.c currently errors with "undefined symbol: __llvm_libc_syscall" See: https://github.com/llvm/llvm-project/pull/67032	2023-10-19 11:22:16 +02:00
alfredfo	74b0465fe9	[libc] Add simple features.h with implementation macro (#69402 ) In the future this should probably be autogenerated so it defines library version. See: Discussion in #libc https://discord.com/channels/636084430946959380/636732994891284500/1163979080979460176	2023-10-19 04:08:13 +02:00
Joseph Huber	ddc30ff802	[libc] Implement the 'ungetc' function on the GPU (#69248 ) Summary: This function follows closely with the pattern of all the other functions. That is, making a new opcode and forwarding the call to the host. However, this also required modifying the test somewhat. It seems that not all `libc` implementations follow the same error rules as are tested here, and it is not explicit in the standard, so we simply disable these EOF checks when targeting the GPU.	2023-10-17 13:02:31 -05:00
michaelrj-google	8a47ad4b67	[libc] Add simple long double to printf float fuzz (#68449 ) Recent testing has uncovered some hard-to-find bugs in printf's long double support. This patch adds an extra long double path to the fuzzer with minimal extra effort. While a more thorough long double fuzzer would be useful, it would need to handle the non-standard cases of 80 bit long doubles such as unnormal and pseudo-denormal numbers. For that reason, a standalone long double fuzzer is left for future development.	2023-10-16 13:32:34 -07:00
Samira Bazuzi	b5c2fa14ea	[libc] Mark operator== const to avoid ambiguity in C++20. (#68805 ) C++20 will automatically generate an operator== with reversed operand order, which is ambiguous with the written operator== when one argument is marked const and the other isn't. This operator currently triggers -Wambiguous-reversed-operator at usage site libc/test/UnitTest/PrintfMatcher.cpp:28.	2023-10-11 23:59:13 -04:00
Joseph Huber	9bcf9dc98a	[libc] Fix missing warp sync for the NVPTX assert Summary: The implementation of `assert` has an if statement so that only the first thread in the warp prints the assertion. On modern NVPTX architecture, this can be printed out of order with the abort call. This would lead to only a portion of the message being printed and then exiting the program. By adding a mandatory warp sync we force the full string to be printed before we continue to the abort.	2023-10-10 12:50:37 -05:00
Joseph Huber	fa23a2396b	[libc] Fix linking of AMDGPU device runtime control constants for math (#65676 ) Summary: Currently, `libc` temporarily provides math by linking against existing vendor implementations. To use the AMDGPU DeviceRTL we need to define a handful of control constants that alter behaviour for architecture specific things. Previously these were marked `extern const` because they must be present when we link-in the vendor bitcode library. However, this causes linker errors if more than one math function was used. This patch fixes the issue by marking these functions as used and inline on top of being external. This means that they are linkable, but it gives us `linkonce_odr` semantics. The downside is that these globals won't be optimized out, but it allows us to perform constant propagation on them unlike using `weak`.	2023-10-06 21:50:35 -05:00
Joseph Huber	4cb6c1c7cb	[libc] Enable missing memory tests on the GPU (#68111 ) Summary: There were a few tests that weren't enabled on the GPU. This is because the logic caused them to be skipped as we don't use CPU featured on the host. This also disables the logic making multiple versions of the memory functions.	2023-10-06 08:27:36 -05:00
tnv01	28245b4ecb	[libc] Add x86-64 stack protector support.	2023-10-04 14:18:23 -07:00
michaelrj-google	bfcfc2a6d4	[libc] Fix typo in long double negative block (#68243 ) The long double version of float to string's get_negative_block had a bug in table mode. In table mode, one of the tables is named "MIN_BLOCK_2" and it stores the number of blocks that are all zeroes before the digits start for a given index. The check for long doubles was incorrectly "block_index <= MIN_BLOCK_2[idx]" when it should be "block_index < MIN_BLOCK_2[idx]" (without the equal sign). This bug caused an off-by-one error for some long double values. This patch fixes the bug and adds tests to ensure it doesn't regress.	2023-10-04 13:00:48 -07:00
Mikhail R. Gadelha	714b4c82bb	[libc][NFC] Fix -Wdangling-else when compiling libc with gcc >= 7 (#67833 ) Explicit braces were added to fix the "suggest explicit braces to avoid ambiguous ‘else’" warning since the current solution (switch (0) case 0: default:) doesn't work since gcc 7 (see https://github.com/google/googletest/issues/1119) gcc 13 generates about 5000 of these warnings when building libc without this patch.	2023-10-04 11:44:42 -04:00
Joseph Huber	452fa6b86d	[libc] Change the GPU to use builtin memory functions (#68003 ) Summary: The GPU build is special in the sense that we always know that up-to-date `clang` is always going to be the compiler. This allows us to rely directly on builtins, which allow us to push a lot of this complexity into the backend. Backend implementations are favored on the GPU because it allows us to do a lot more target specific optimizations. This patch changes over the common memory functions to use builtin versions when building for AMDGPU or NVPTX.	2023-10-04 07:02:55 -05:00
Mikhail R. Gadelha	824b1677a4	[libc][NFC] Fix missing field 'tm_isdst' initializer warning (#67837 ) This patch fixes several warnings thrown by clang about an uninitialized member of struct tm, tm_isdst. Weirdly, gcc doesn't complain about it, probably this member is never read in the tests.	2023-10-02 19:32:55 -04:00
Mikhail R. Gadelha	8fc87f54a8	[libc][NFC] Couple of small warning fixes (#67847 ) This patch fixes a couple of warnings when compiling with gcc 13: * CPP/type_traits_test.cpp: 'apply' overrides a member function but is not marked 'override' * UnitTest/LibcTest.cpp:98: control reaches end of non-void function * MPFRWrapper/MPFRUtils.cpp:75: control reaches end of non-void function * smoke/FrexpTest.h:92: backslash-newline at end of file * __support/float_to_string.h:118: comparison of unsigned expression in ‘>= 0’ is always true * test/src/__support/CPP/bitset_test.cpp:197: comparison of unsigned expression in ‘>= 0’ is always true --------- Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>	2023-10-02 19:29:26 -04:00
Joseph Huber	f88f090a2e	[libc] Correct 'memrchr' definition and re-enable on GPU (#67850 ) Summary: This was disabled on the GPU because it conflicted with the definition in `glibc`. According to information online and in the `glibc` implementation, the first argument should be a `const void *`. Fixing this resolves the problem when exporting this to offloading languages.	2023-09-29 18:22:00 -05:00
Joseph Huber	e0b702ffc2	[libc] Fix `nanosleep` definition in the posix spec (#67855 ) Summary: The POSIX standard expects the first argument to this function to be constant, e.g. https://man7.org/linux/man-pages/man2/nanosleep.2.html. This fixes that problem and also corrects an obvious problem with enabling this for offloading.	2023-09-29 17:35:10 -05:00
Joseph Huber	ce38cbb13b	[libc][NFC] Adjust the `libc` init / fini array test Summary: The NVPTX backend is picky about the definitions of functions. Because we call these functions with these arguments it can cause some problems when it goes through the backend. This was observed in a different test for `printf` that hasn't been landed yet. Also adjust the priority.	2023-09-29 13:22:02 -05:00
Joseph Huber	22ebf1e9b7	[libc][Obvious] Do not pass 'nolibc' and other flags to the GPU build Summary: Previously this code was applied to the integration tests but did not copy the logic that stopped this from being passed to the GPU build. Copy the full line to avoid the warnings and prevent any libraries from being included.	2023-09-29 12:57:02 -05:00
Mikhail R. Gadelha	dbceb1d936	[libc] Fix unused variable in fputc test (#67830 ) This is probably a copy-and-paste error and the variable 'more' was left unused.	2023-09-29 12:31:40 -04:00
lntue	da28593d71	[libc][math] Implement double precision expm1 function correctly rounded for all rounding modes. (#67048 ) Implementing expm1 function for double precision based on exp function algorithm: - Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where: * hi is an integer * mid1 * 2^-6 is an integer * mid2 * 2^-12 is an integer * \|lo\| < 2^-13 + 2^-30 - Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi * (2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) ) - We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of (e^lo - 1) / lo in double precision - If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of (e^lo - 1) / lo in double double precision - If the Ziv accuracy test still fails, we re-evaluate everything in 128-bit precision.	2023-09-28 16:43:15 -04:00
Joseph Huber	cc2445589d	[libc] Fix wrapper headers for some ctype macros and C++ decls Summary: These wrapper headers need to work around things in the standard headers. The existing workarounds didn't correctly handle the macros for `iscascii` and `toascii`. Additionally, `memrchr` can't be used because it has a different declaration for C++ mode. Fix this so it can be compiled.	2023-09-28 10:00:34 -05:00
Joseph Huber	1a5d3b6cda	[libc] Scan the ports more fairly in the RPC server (#66680 ) Summary: Currently, we use the RPC server to respond to different ports which each contain a request from some client thread wishing to do work on the server. This scan starts at zero and continues until its checked all ports at which point it resets. If we find an active port, we service it and then restart the search. This is bad for two reasons. First, it means that we will always bias the lower ports. If a thread grabs a high port it will be stuck for a very long time until all the other work is done. Second, it means that the `handle_server` function can technically run indefinitely as long as the client is always pushing new work. Because the OpenMP implementation uses the user thread to service the kernel, this means that it could be stalled with another asyncrhonous device's kernels. This patch addresses this by making the server restart at the next port over. This means we will always do a full scan of the ports before quitting.	2023-09-26 16:09:48 -05:00
Joseph Huber	6273b6d9dc	[libc] Change RPC opcode enum definition (#67439 ) Summary: This enum previously manually specified the value. This just made it unnecessarily difficult to add new ones without changing everything. This patch also makes it compatible with C by removing the `:` annotation and instead using the `LAST` method.	2023-09-26 15:24:28 -05:00
Joseph Huber	2b7227db1e	[libc] Fix RPC server global after mass replace of __llvm_libc Summary: This variable needs a reserved name starting with `__`. It was mistakenly changed with a mass replace. It happened to work because the tests still picked up the associated symbol, but it just became a bad name because it's not reserved anymore.	2023-09-26 14:28:48 -05:00
Siva Chandra	f2c9fe452f	[libc][NFC] Fix delete operator linkage names after switch to LIBC_NAMESPACE. (#67475 ) The name __llvm_libc was mass-replaced with LIBC_NAMESPACE which ended up changing the "__llvm_libc" prefix of the delete operator linkage names to "LIBC_NAMESPACE". This change corrects it by changing the namespace prefix to "__llvm_libc_<version info>".	2023-09-26 11:53:14 -07:00
Siva Chandra	425defd810	[libc][Obvious] Remove the previous ErrnoSetterMatcher target. (#67469 ) A target still depending on the old target has been updated.	2023-09-26 11:01:21 -07:00
Siva Chandra	3bfd6a7521	[libc][NFC] Add compile options only to the header libraries which use them. (#67447 ) Other libraries dependent on these libraries will automatically inherit those compile options. This change in particular affects the compile option "-DLIBC_COPT_STDIO_USE_SYSTEM_FILE".	2023-09-26 09:20:00 -07:00
Mikhail R. Gadelha	e3087c4b8c	[libc] Start to refactor riscv platform abstraction to support both 32 and 64 bits versions This patch enables the compilation of libc for rv32 by unifying the current rv64 and rv32 implementation into a single rv implementation. We updated the cmake file to match the new riscv32 arch and force LIBC_TARGET_ARCHITECTURE to be "riscv" whenever we find "riscv32" or "riscv64". This is required as LIBC_TARGET_ARCHITECTURE is used in the path for several platform specific implementations. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D148797	2023-09-26 12:32:25 -03:00
Siva Chandra	599eadec28	[libc] Propagate printf config options from a single config header library. (#66979 ) printf_core.parser is not yet updated to use the printf config options. It does not use them currently anyway and the corresponding parser_test should be updated to respect the config options.	2023-09-26 08:16:31 -07:00
Siva Chandra	aecb58005c	[libc][NFC] Remove an inappropriate -ffreestanding arg to memory_utils test. (#67435 )	2023-09-26 08:04:08 -07:00
Joseph Huber	1b8c8155cc	[libc][Obvious] Fix incorrect filepath for ftell.h header Summary: The previous patch moved the location of this CMake line but didn't update the header. Fix it.	2023-09-26 10:02:20 -05:00
Joseph Huber	7ac8e26fc7	[libc] Implement `fseek`, `fflush`, and `ftell` on the GPU (#67160 ) Summary: This patch adds the necessary entrypoints to handle the `fseek`, `fflush`, and `ftell` functions. These are all very straightfoward, we simply make RPC calls to the associated function on the other end. Implementing it this way allows us to more or less borrow the state of the stream from the server as we intentionally maintain no internal state on the GPU device. However, this does not implement the `errno` functinality so that must be ignored.	2023-09-26 09:46:46 -05:00

1 2 3 4 5 ...

2282 Commits