Commit Graph

2282 Commits

Author SHA1 Message Date
Dmitry Vyukov
0e110fb429 [libc] memmove optimizations (#70043)
1. Remove is_disjoint check for smaller sizes and reduce code bloat.

inline_memmove may handle some small sizes as efficiently
as inline_memcpy. For these sizes we may not do is_disjoint check.
This both avoids additional code for the most frequent smaller sizes
and removes code bloat (we don't need the memcpy logic for small sizes).
Here we heavily rely on inlining and dead code elimination: from the
first
inline_memmove we should get only handling of small sizes, and from
the second inline_memmove and inline_memcpy we should get only handling
of larger sizes.

2. Use the memcpy thresholds for memmove.
Memcpy thresholds were more carefully tuned.
This becomes more important since we use memmove
for all small sizes always now.

3. Fix boundary conditions for sizes = 16/32/64.
See the added comment for explanations.

Memmove function size drops from 885 to 715 bytes
due to removed duplication.

```
                 │  baseline   │             small-size              │
                 │   sec/op    │   sec/op     vs base                │
memmove/Google_A   3.208n ± 0%   2.911n ± 0%   -9.25% (n=100)
memmove/Google_B   4.113n ± 1%   3.428n ± 0%  -16.65% (n=100)
memmove/Google_D   5.838n ± 0%   4.158n ± 0%  -28.78% (n=100)
memmove/Google_S   4.712n ± 1%   3.899n ± 0%  -17.25% (n=100)
memmove/Google_U   3.609n ± 0%   3.247n ± 1%  -10.02% (n=100)
memmove/0          2.982n ± 0%   2.169n ± 0%  -27.26% (n=50)
memmove/1          3.253n ± 0%   2.168n ± 0%  -33.34% (n=50)
memmove/2          3.255n ± 0%   2.169n ± 0%  -33.38% (n=50)
memmove/3          3.259n ± 2%   2.175n ± 0%  -33.27% (p=0.000 n=50)
memmove/4          3.259n ± 0%   2.168n ± 5%  -33.46% (p=0.000 n=50)
memmove/5          2.488n ± 0%   1.926n ± 0%  -22.57% (p=0.000 n=50)
memmove/6          2.490n ± 0%   1.928n ± 0%  -22.59% (p=0.000 n=50)
memmove/7          2.492n ± 0%   1.927n ± 0%  -22.65% (p=0.000 n=50)
memmove/8          2.737n ± 0%   2.711n ± 0%   -0.97% (p=0.000 n=50)
memmove/9          2.736n ± 0%   2.711n ± 0%   -0.94% (p=0.000 n=50)
memmove/10         2.739n ± 0%   2.711n ± 0%   -1.04% (p=0.000 n=50)
memmove/11         2.740n ± 0%   2.711n ± 0%   -1.07% (p=0.000 n=50)
memmove/12         2.740n ± 0%   2.711n ± 0%   -1.09% (p=0.000 n=50)
memmove/13         2.744n ± 0%   2.711n ± 0%   -1.22% (p=0.000 n=50)
memmove/14         2.742n ± 0%   2.711n ± 0%   -1.14% (p=0.000 n=50)
memmove/15         2.742n ± 0%   2.711n ± 0%   -1.15% (p=0.000 n=50)
memmove/16         2.997n ± 0%   2.981n ± 0%   -0.52% (p=0.000 n=50)
memmove/17         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/18         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/19         2.999n ± 0%   2.982n ± 0%   -0.59% (p=0.000 n=50)
memmove/20         2.998n ± 0%   2.981n ± 0%   -0.55% (p=0.000 n=50)
memmove/21         3.000n ± 0%   2.981n ± 0%   -0.61% (p=0.000 n=50)
memmove/22         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/23         3.002n ± 0%   2.981n ± 0%   -0.67% (p=0.000 n=50)
memmove/24         3.002n ± 0%   2.981n ± 0%   -0.70% (n=50)
memmove/25         3.002n ± 0%   2.981n ± 0%   -0.68% (p=0.000 n=50)
memmove/26         3.004n ± 0%   2.982n ± 0%   -0.74% (p=0.000 n=50)
memmove/27         3.005n ± 0%   2.981n ± 0%   -0.79% (n=50)
memmove/28         3.005n ± 0%   2.982n ± 0%   -0.77% (n=50)
memmove/29         3.009n ± 0%   2.981n ± 0%   -0.92% (n=50)
memmove/30         3.008n ± 0%   2.981n ± 0%   -0.89% (n=50)
memmove/31         3.007n ± 0%   2.982n ± 0%   -0.86% (n=50)
memmove/32         3.540n ± 0%   2.998n ± 0%  -15.31% (p=0.000 n=50)
memmove/33         3.544n ± 0%   2.997n ± 0%  -15.44% (p=0.000 n=50)
memmove/34         3.546n ± 0%   2.999n ± 0%  -15.42% (n=50)
memmove/35         3.545n ± 0%   2.999n ± 0%  -15.40% (n=50)
memmove/36         3.548n ± 0%   2.998n ± 0%  -15.52% (p=0.000 n=50)
memmove/37         3.546n ± 0%   3.000n ± 0%  -15.41% (n=50)
memmove/38         3.549n ± 0%   2.999n ± 0%  -15.49% (p=0.000 n=50)
memmove/39         3.549n ± 0%   2.999n ± 0%  -15.48% (p=0.000 n=50)
memmove/40         3.549n ± 0%   3.000n ± 0%  -15.46% (p=0.000 n=50)
memmove/41         3.550n ± 0%   3.001n ± 0%  -15.47% (n=50)
memmove/42         3.549n ± 0%   3.001n ± 0%  -15.43% (n=50)
memmove/43         3.552n ± 0%   3.001n ± 0%  -15.52% (p=0.000 n=50)
memmove/44         3.552n ± 0%   3.001n ± 0%  -15.51% (n=50)
memmove/45         3.552n ± 0%   3.002n ± 0%  -15.48% (n=50)
memmove/46         3.554n ± 0%   3.001n ± 0%  -15.55% (p=0.000 n=50)
memmove/47         3.556n ± 0%   3.002n ± 0%  -15.58% (p=0.000 n=50)
memmove/48         3.555n ± 0%   3.003n ± 0%  -15.54% (n=50)
memmove/49         3.557n ± 0%   3.002n ± 0%  -15.59% (p=0.000 n=50)
memmove/50         3.557n ± 0%   3.004n ± 0%  -15.55% (p=0.000 n=50)
memmove/51         3.556n ± 0%   3.004n ± 0%  -15.53% (p=0.000 n=50)
memmove/52         3.561n ± 0%   3.004n ± 0%  -15.65% (p=0.000 n=50)
memmove/53         3.558n ± 0%   3.004n ± 0%  -15.57% (p=0.000 n=50)
memmove/54         3.561n ± 0%   3.005n ± 0%  -15.62% (n=50)
memmove/55         3.560n ± 0%   3.006n ± 0%  -15.57% (n=50)
memmove/56         3.562n ± 0%   3.006n ± 0%  -15.60% (p=0.000 n=50)
memmove/57         3.563n ± 0%   3.006n ± 0%  -15.64% (n=50)
memmove/58         3.565n ± 0%   3.007n ± 0%  -15.64% (p=0.000 n=50)
memmove/59         3.564n ± 0%   3.006n ± 0%  -15.66% (p=0.000 n=50)
memmove/60         3.570n ± 0%   3.008n ± 0%  -15.74% (p=0.000 n=50)
memmove/61         3.566n ± 0%   3.009n ± 0%  -15.63% (p=0.000 n=50)
memmove/62         3.567n ± 0%   3.007n ± 0%  -15.70% (p=0.000 n=50)
memmove/63         3.568n ± 0%   3.008n ± 0%  -15.71% (p=0.000 n=50)
memmove/64         4.104n ± 0%   3.008n ± 0%  -26.70% (p=0.000 n=50)
memmove/65         4.126n ± 0%   3.662n ± 0%  -11.26% (p=0.000 n=50)
memmove/66         4.128n ± 0%   3.662n ± 0%  -11.29% (n=50)
memmove/67         4.129n ± 0%   3.662n ± 0%  -11.31% (n=50)
memmove/68         4.129n ± 0%   3.661n ± 0%  -11.33% (p=0.000 n=50)
memmove/69         4.130n ± 0%   3.662n ± 0%  -11.34% (p=0.000 n=50)
memmove/70         4.130n ± 0%   3.662n ± 0%  -11.33% (n=50)
memmove/71         4.132n ± 0%   3.662n ± 0%  -11.38% (p=0.000 n=50)
memmove/72         4.131n ± 0%   3.661n ± 0%  -11.39% (n=50)
memmove/73         4.135n ± 0%   3.661n ± 0%  -11.45% (p=0.000 n=50)
memmove/74         4.137n ± 0%   3.662n ± 0%  -11.49% (n=50)
memmove/75         4.138n ± 0%   3.662n ± 0%  -11.51% (p=0.000 n=50)
memmove/76         4.139n ± 0%   3.661n ± 0%  -11.56% (p=0.000 n=50)
memmove/77         4.136n ± 0%   3.662n ± 0%  -11.47% (p=0.000 n=50)
memmove/78         4.143n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/79         4.142n ± 0%   3.661n ± 0%  -11.60% (n=50)
memmove/80         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/81         4.140n ± 0%   3.661n ± 0%  -11.57% (n=50)
memmove/82         4.146n ± 0%   3.661n ± 0%  -11.69% (n=50)
memmove/83         4.143n ± 0%   3.661n ± 0%  -11.63% (p=0.000 n=50)
memmove/84         4.143n ± 0%   3.661n ± 0%  -11.63% (n=50)
memmove/85         4.147n ± 0%   3.661n ± 0%  -11.73% (p=0.000 n=50)
memmove/86         4.142n ± 0%   3.661n ± 0%  -11.62% (p=0.000 n=50)
memmove/87         4.147n ± 0%   3.661n ± 0%  -11.72% (p=0.000 n=50)
memmove/88         4.148n ± 0%   3.661n ± 0%  -11.74% (n=50)
memmove/89         4.152n ± 0%   3.661n ± 0%  -11.84% (n=50)
memmove/90         4.151n ± 0%   3.661n ± 0%  -11.81% (n=50)
memmove/91         4.150n ± 0%   3.661n ± 0%  -11.78% (n=50)
memmove/92         4.153n ± 0%   3.661n ± 0%  -11.86% (n=50)
memmove/93         4.158n ± 0%   3.661n ± 0%  -11.95% (n=50)
memmove/94         4.157n ± 0%   3.661n ± 0%  -11.95% (p=0.000 n=50)
memmove/95         4.155n ± 0%   3.661n ± 0%  -11.90% (p=0.000 n=50)
memmove/96         4.149n ± 0%   3.660n ± 0%  -11.79% (n=50)
memmove/97         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/98         4.157n ± 0%   3.661n ± 0%  -11.94% (n=50)
memmove/99         4.168n ± 0%   3.661n ± 0%  -12.17% (p=0.000 n=50)
memmove/100        4.159n ± 0%   3.660n ± 0%  -12.00% (p=0.000 n=50)
memmove/101        4.161n ± 0%   3.660n ± 0%  -12.03% (p=0.000 n=50)
memmove/102        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/103        4.164n ± 0%   3.661n ± 0%  -12.08% (n=50)
memmove/104        4.164n ± 0%   3.660n ± 0%  -12.11% (n=50)
memmove/105        4.165n ± 0%   3.660n ± 0%  -12.12% (p=0.000 n=50)
memmove/106        4.166n ± 0%   3.660n ± 0%  -12.15% (n=50)
memmove/107        4.171n ± 0%   3.660n ± 1%  -12.26% (p=0.000 n=50)
memmove/108        4.173n ± 0%   3.660n ± 0%  -12.30% (p=0.000 n=50)
memmove/109        4.170n ± 0%   3.660n ± 0%  -12.24% (n=50)
memmove/110        4.174n ± 0%   3.660n ± 0%  -12.31% (n=50)
memmove/111        4.176n ± 0%   3.660n ± 0%  -12.35% (p=0.000 n=50)
memmove/112        4.174n ± 0%   3.659n ± 0%  -12.34% (p=0.000 n=50)
memmove/113        4.176n ± 0%   3.660n ± 0%  -12.35% (n=50)
memmove/114        4.182n ± 0%   3.660n ± 0%  -12.49% (n=50)
memmove/115        4.185n ± 0%   3.660n ± 0%  -12.55% (n=50)
memmove/116        4.184n ± 0%   3.659n ± 0%  -12.54% (n=50)
memmove/117        4.182n ± 0%   3.660n ± 0%  -12.50% (n=50)
memmove/118        4.188n ± 0%   3.660n ± 0%  -12.61% (n=50)
memmove/119        4.186n ± 0%   3.660n ± 0%  -12.57% (p=0.000 n=50)
memmove/120        4.189n ± 0%   3.659n ± 0%  -12.63% (n=50)
memmove/121        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/122        4.186n ± 0%   3.660n ± 0%  -12.58% (n=50)
memmove/123        4.187n ± 0%   3.660n ± 0%  -12.60% (n=50)
memmove/124        4.189n ± 0%   3.659n ± 0%  -12.65% (n=50)
memmove/125        4.195n ± 0%   3.659n ± 0%  -12.78% (n=50)
memmove/126        4.197n ± 0%   3.659n ± 0%  -12.81% (n=50)
memmove/127        4.194n ± 0%   3.659n ± 0%  -12.75% (n=50)
memmove/128        5.035n ± 0%   3.659n ± 0%  -27.32% (n=50)
memmove/129        5.127n ± 0%   5.164n ± 0%   +0.73% (p=0.000 n=50)
memmove/130        5.130n ± 0%   5.176n ± 0%   +0.88% (p=0.000 n=50)
memmove/131        5.127n ± 0%   5.180n ± 0%   +1.05% (p=0.000 n=50)
memmove/132        5.131n ± 0%   5.169n ± 0%   +0.75% (p=0.000 n=50)
memmove/133        5.137n ± 0%   5.179n ± 0%   +0.81% (p=0.000 n=50)
memmove/134        5.140n ± 0%   5.178n ± 0%   +0.74% (p=0.000 n=50)
memmove/135        5.141n ± 0%   5.187n ± 0%   +0.88% (p=0.000 n=50)
memmove/136        5.133n ± 0%   5.184n ± 0%   +0.99% (p=0.000 n=50)
memmove/137        5.148n ± 0%   5.186n ± 0%   +0.73% (p=0.000 n=50)
memmove/138        5.143n ± 0%   5.189n ± 0%   +0.88% (p=0.000 n=50)
memmove/139        5.142n ± 0%   5.192n ± 0%   +0.97% (p=0.000 n=50)
memmove/140        5.141n ± 0%   5.192n ± 0%   +1.01% (p=0.000 n=50)
memmove/141        5.155n ± 0%   5.188n ± 0%   +0.64% (p=0.000 n=50)
memmove/142        5.146n ± 0%   5.192n ± 0%   +0.90% (p=0.000 n=50)
memmove/143        5.142n ± 0%   5.203n ± 0%   +1.19% (p=0.000 n=50)
memmove/144        5.146n ± 0%   5.197n ± 0%   +0.99% (p=0.000 n=50)
memmove/145        5.146n ± 0%   5.196n ± 0%   +0.97% (p=0.000 n=50)
memmove/146        5.151n ± 0%   5.207n ± 0%   +1.10% (p=0.000 n=50)
memmove/147        5.151n ± 0%   5.205n ± 0%   +1.06% (p=0.000 n=50)
memmove/148        5.156n ± 0%   5.190n ± 0%   +0.66% (p=0.000 n=50)
memmove/149        5.158n ± 0%   5.212n ± 0%   +1.04% (p=0.000 n=50)
memmove/150        5.160n ± 0%   5.203n ± 0%   +0.84% (p=0.000 n=50)
memmove/151        5.167n ± 0%   5.210n ± 0%   +0.83% (p=0.000 n=50)
memmove/152        5.157n ± 0%   5.206n ± 0%   +0.94% (p=0.000 n=50)
memmove/153        5.170n ± 0%   5.211n ± 0%   +0.80% (p=0.000 n=50)
memmove/154        5.169n ± 0%   5.222n ± 0%   +1.02% (p=0.000 n=50)
memmove/155        5.171n ± 0%   5.215n ± 0%   +0.87% (p=0.000 n=50)
memmove/156        5.174n ± 0%   5.214n ± 0%   +0.78% (p=0.000 n=50)
memmove/157        5.171n ± 0%   5.218n ± 0%   +0.92% (p=0.000 n=50)
memmove/158        5.168n ± 0%   5.224n ± 0%   +1.09% (p=0.000 n=50)
memmove/159        5.179n ± 0%   5.218n ± 0%   +0.76% (p=0.000 n=50)
memmove/160        5.170n ± 0%   5.219n ± 0%   +0.95% (p=0.000 n=50)
memmove/161        5.187n ± 0%   5.220n ± 0%   +0.64% (p=0.000 n=50)
memmove/162        5.189n ± 0%   5.234n ± 0%   +0.86% (p=0.000 n=50)
memmove/163        5.199n ± 0%   5.250n ± 0%   +0.99% (p=0.000 n=50)
memmove/164        5.205n ± 0%   5.260n ± 0%   +1.04% (p=0.000 n=50)
memmove/165        5.208n ± 0%   5.261n ± 0%   +1.01% (p=0.000 n=50)
memmove/166        5.227n ± 0%   5.275n ± 0%   +0.91% (p=0.000 n=50)
memmove/167        5.233n ± 0%   5.281n ± 0%   +0.92% (p=0.000 n=50)
memmove/168        5.236n ± 0%   5.295n ± 0%   +1.12% (p=0.000 n=50)
memmove/169        5.256n ± 0%   5.297n ± 0%   +0.79% (p=0.000 n=50)
memmove/170        5.259n ± 0%   5.302n ± 0%   +0.80% (p=0.000 n=50)
memmove/171        5.269n ± 0%   5.321n ± 0%   +0.97% (p=0.000 n=50)
memmove/172        5.266n ± 0%   5.318n ± 0%   +0.98% (p=0.000 n=50)
memmove/173        5.272n ± 0%   5.330n ± 0%   +1.09% (p=0.000 n=50)
memmove/174        5.284n ± 0%   5.331n ± 0%   +0.89% (p=0.000 n=50)
memmove/175        5.284n ± 0%   5.322n ± 0%   +0.72% (p=0.000 n=50)
memmove/176        5.298n ± 0%   5.337n ± 0%   +0.74% (p=0.000 n=50)
memmove/177        5.282n ± 0%   5.338n ± 0%   +1.04% (p=0.000 n=50)
memmove/178        5.299n ± 0%   5.337n ± 0%   +0.71% (p=0.000 n=50)
memmove/179        5.296n ± 0%   5.343n ± 0%   +0.88% (p=0.000 n=50)
memmove/180        5.292n ± 0%   5.343n ± 0%   +0.97% (p=0.000 n=50)
memmove/181        5.303n ± 0%   5.335n ± 0%   +0.60% (p=0.000 n=50)
memmove/182        5.305n ± 0%   5.338n ± 0%   +0.62% (p=0.000 n=50)
memmove/183        5.298n ± 0%   5.329n ± 0%   +0.59% (p=0.000 n=50)
memmove/184        5.299n ± 0%   5.333n ± 0%   +0.64% (p=0.000 n=50)
memmove/185        5.291n ± 0%   5.330n ± 0%   +0.73% (p=0.000 n=50)
memmove/186        5.296n ± 0%   5.332n ± 0%   +0.68% (p=0.000 n=50)
memmove/187        5.297n ± 0%   5.320n ± 0%   +0.44% (p=0.000 n=50)
memmove/188        5.286n ± 0%   5.314n ± 0%   +0.53% (p=0.000 n=50)
memmove/189        5.293n ± 0%   5.318n ± 0%   +0.46% (p=0.000 n=50)
memmove/190        5.294n ± 0%   5.318n ± 0%   +0.45% (p=0.000 n=50)
memmove/191        5.292n ± 0%   5.314n ± 0%   +0.40% (p=0.032 n=50)
memmove/192        5.272n ± 0%   5.304n ± 0%   +0.60% (p=0.000 n=50)
memmove/193        5.279n ± 0%   5.310n ± 0%   +0.57% (p=0.000 n=50)
memmove/194        5.294n ± 0%   5.308n ± 0%   +0.26% (p=0.018 n=50)
memmove/195        5.302n ± 0%   5.311n ± 0%   +0.18% (p=0.010 n=50)
memmove/196        5.301n ± 0%   5.316n ± 0%   +0.28% (p=0.023 n=50)
memmove/197        5.302n ± 0%   5.327n ± 0%   +0.47% (p=0.000 n=50)
memmove/198        5.310n ± 0%   5.326n ± 0%   +0.30% (p=0.003 n=50)
memmove/199        5.303n ± 0%   5.319n ± 0%   +0.30% (p=0.009 n=50)
memmove/200        5.312n ± 0%   5.330n ± 0%   +0.35% (p=0.001 n=50)
memmove/201        5.307n ± 0%   5.333n ± 0%   +0.50% (p=0.000 n=50)
memmove/202        5.311n ± 0%   5.334n ± 0%   +0.44% (p=0.000 n=50)
memmove/203        5.313n ± 0%   5.335n ± 0%   +0.41% (p=0.006 n=50)
memmove/204        5.312n ± 0%   5.332n ± 0%   +0.36% (p=0.002 n=50)
memmove/205        5.318n ± 0%   5.345n ± 0%   +0.50% (p=0.000 n=50)
memmove/206        5.311n ± 0%   5.333n ± 0%   +0.42% (p=0.002 n=50)
memmove/207        5.310n ± 0%   5.338n ± 0%   +0.52% (p=0.000 n=50)
memmove/208        5.319n ± 0%   5.341n ± 0%   +0.40% (p=0.004 n=50)
memmove/209        5.330n ± 0%   5.346n ± 0%   +0.30% (p=0.004 n=50)
memmove/210        5.329n ± 0%   5.349n ± 0%   +0.38% (p=0.002 n=50)
memmove/211        5.318n ± 0%   5.340n ± 0%   +0.41% (p=0.000 n=50)
memmove/212        5.339n ± 0%   5.343n ± 0%        ~ (p=0.396 n=50)
memmove/213        5.329n ± 0%   5.343n ± 0%   +0.25% (p=0.017 n=50)
memmove/214        5.339n ± 0%   5.358n ± 0%   +0.35% (p=0.035 n=50)
memmove/215        5.342n ± 0%   5.346n ± 0%        ~ (p=0.063 n=50)
memmove/216        5.338n ± 0%   5.359n ± 0%   +0.39% (p=0.002 n=50)
memmove/217        5.341n ± 0%   5.362n ± 0%   +0.39% (p=0.015 n=50)
memmove/218        5.354n ± 0%   5.373n ± 0%   +0.36% (p=0.041 n=50)
memmove/219        5.352n ± 0%   5.362n ± 0%        ~ (p=0.143 n=50)
memmove/220        5.344n ± 0%   5.370n ± 0%   +0.50% (p=0.001 n=50)
memmove/221        5.345n ± 0%   5.373n ± 0%   +0.53% (p=0.000 n=50)
memmove/222        5.348n ± 0%   5.360n ± 0%   +0.23% (p=0.014 n=50)
memmove/223        5.354n ± 0%   5.377n ± 0%   +0.43% (p=0.024 n=50)
memmove/224        5.352n ± 0%   5.363n ± 0%        ~ (p=0.052 n=50)
memmove/225        5.372n ± 0%   5.380n ± 0%        ~ (p=0.481 n=50)
memmove/226        5.368n ± 0%   5.386n ± 0%   +0.34% (p=0.004 n=50)
memmove/227        5.386n ± 0%   5.402n ± 0%   +0.29% (p=0.028 n=50)
memmove/228        5.400n ± 0%   5.408n ± 0%        ~ (p=0.174 n=50)
memmove/229        5.423n ± 0%   5.427n ± 0%        ~ (p=0.444 n=50)
memmove/230        5.411n ± 0%   5.429n ± 0%   +0.33% (p=0.020 n=50)
memmove/231        5.420n ± 0%   5.433n ± 0%   +0.24% (p=0.034 n=50)
memmove/232        5.435n ± 0%   5.441n ± 0%        ~ (p=0.235 n=50)
memmove/233        5.446n ± 0%   5.462n ± 0%        ~ (p=0.590 n=50)
memmove/234        5.467n ± 0%   5.461n ± 0%        ~ (p=0.921 n=50)
memmove/235        5.472n ± 0%   5.478n ± 0%        ~ (p=0.883 n=50)
memmove/236        5.466n ± 0%   5.478n ± 0%        ~ (p=0.324 n=50)
memmove/237        5.471n ± 0%   5.489n ± 0%        ~ (p=0.132 n=50)
memmove/238        5.485n ± 0%   5.489n ± 0%        ~ (p=0.460 n=50)
memmove/239        5.484n ± 0%   5.488n ± 0%        ~ (p=0.833 n=50)
memmove/240        5.483n ± 0%   5.495n ± 0%        ~ (p=0.095 n=50)
memmove/241        5.498n ± 0%   5.514n ± 0%        ~ (p=0.077 n=50)
memmove/242        5.518n ± 0%   5.517n ± 0%        ~ (p=0.481 n=50)
memmove/243        5.514n ± 0%   5.511n ± 0%        ~ (p=0.503 n=50)
memmove/244        5.510n ± 0%   5.497n ± 0%   -0.24% (p=0.038 n=50)
memmove/245        5.516n ± 0%   5.505n ± 0%        ~ (p=0.317 n=50)
memmove/246        5.513n ± 1%   5.494n ± 0%        ~ (p=0.147 n=50)
memmove/247        5.518n ± 0%   5.499n ± 0%   -0.36% (p=0.011 n=50)
memmove/248        5.503n ± 0%   5.492n ± 0%        ~ (p=0.267 n=50)
memmove/249        5.498n ± 0%   5.497n ± 0%        ~ (p=0.765 n=50)
memmove/250        5.485n ± 0%   5.493n ± 0%        ~ (p=0.348 n=50)
memmove/251        5.503n ± 0%   5.482n ± 0%   -0.37% (p=0.013 n=50)
memmove/252        5.497n ± 0%   5.485n ± 0%        ~ (p=0.077 n=50)
memmove/253        5.489n ± 0%   5.496n ± 0%        ~ (p=0.850 n=50)
memmove/254        5.497n ± 0%   5.491n ± 0%        ~ (p=0.548 n=50)
memmove/255        5.484n ± 1%   5.494n ± 0%        ~ (p=0.888 n=50)
memmove/256        6.952n ± 0%   7.676n ± 0%  +10.41% (p=0.000 n=50)
geomean            4.406n        4.127n        -6.33%
```
2023-10-26 13:40:25 +02:00
Dmitry Vyukov
605fadf0ca [libc] Add --sweep-min-size flag for benchmarks (#70302)
We have --sweep-max-size, it's reasonable to have --sweep-min-size as
well. It can be used when working on the logic for larger sizes, or to
collect a profile for larger sizes only.
2023-10-26 11:06:15 +02:00
Joseph Huber
e3d2a7d0a5 [libc] Compile the GPU functions with '-fconvergent-functions' (#70229)
Summary:
This patch simply adds the `-fconvergent-functions` flag to the GPU
compilation. This is in relation to the behaviour of SIMT
architectures under divergence. With the flag, we assume every function
is convergent by default and rely on the compiler's divergence analysis
to transform it if possible.

Fixes: https://github.com/llvm/llvm-project/issues/63853
2023-10-25 14:13:21 -05:00
Benjamin Kramer
c4e9a43773 [libc] Fix a constexpr violation from b4e552999d
In msan mode this calls __msan_unpoison, which isn't constexpr.
2023-10-25 13:36:17 +02:00
michaelrj-google
2282af26ea [libc] Disable -NaN test on float128 systems (#70146)
Some float128 systems (specifically the ones used for aarch64 buildbots)
don't respect signs for long double NaNs. This patch disables the printf
test that was failing due to this.
2023-10-24 16:45:54 -07:00
michaelrj-google
b4e552999d [libc] Fix printf long double inf, bitcast in msan (#70067)
These bugs were found with the new printf long double fuzzing. The long
double inf vs nan bug was introduced when we changed to
get_explicit_exponent. The bitcast msan issue hadn't come up previously,
but isn't a real bug, just a poisoning confusion.
2023-10-24 15:41:54 -07:00
Dmitry Vyukov
f364a7a8b4 [libc] Speed up memmove overlapping check (#70017)
Use a check that requries fewer instructions and cheaper.
Current code:
```
   1b704:       48 39 f7                cmp    %rsi,%rdi
   1b707:       48 89 f0                mov    %rsi,%rax
   1b70a:       48 0f 47 c7             cmova  %rdi,%rax
   1b70e:       48 89 f9                mov    %rdi,%rcx
   1b711:       48 0f 47 ce             cmova  %rsi,%rcx
   1b715:       48 01 d1                add    %rdx,%rcx
   1b718:       48 39 c1                cmp    %rax,%rcx
```
New code:
```
   1b704:       48 89 f8                mov    %rdi,%rax
   1b707:       48 29 f0                sub    %rsi,%rax
   1b70a:       48 89 c1                mov    %rax,%rcx
   1b70d:       48 f7 d9                neg    %rcx
   1b710:       48 0f 48 c8             cmovs  %rax,%rcx
   1b714:       48 39 d1                cmp    %rdx,%rcx
```
```
                 │  baseline   │              disjoint              │
                 │   sec/op    │   sec/op     vs base               │
memmove/Google_A   3.910n ± 0%   3.861n ± 1%  -1.26% (p=0.000 n=50)
```
```
            │  baseline   │              disjoint               │
            │   sec/op    │   sec/op     vs base                │
memmove/1     2.724n ± 3%   2.441n ± 0%  -10.37% (n=50)
memmove/2     2.878n ± 0%   2.713n ± 0%   -5.73% (n=50)
memmove/3     2.835n ± 0%   2.593n ± 0%   -8.54% (n=50)
memmove/4     3.032n ± 0%   2.776n ± 0%   -8.45% (p=0.000 n=50)
memmove/5     2.833n ± 0%   2.600n ± 0%   -8.20% (p=0.000 n=50)
memmove/6     2.758n ± 0%   2.744n ± 0%   -0.52% (p=0.000 n=50)
memmove/7     2.762n ± 0%   2.744n ± 0%   -0.63% (p=0.000 n=50)
memmove/8     2.763n ± 0%   2.750n ± 0%   -0.46% (p=0.000 n=50)
memmove/9     3.182n ± 0%   3.269n ± 0%   +2.75% (p=0.000 n=50)
memmove/10    3.185n ± 0%   3.270n ± 0%   +2.64% (p=0.000 n=50)
memmove/11    3.188n ± 0%   3.277n ± 0%   +2.79% (p=0.000 n=50)
memmove/12    3.190n ± 0%   3.279n ± 0%   +2.82% (p=0.000 n=50)
memmove/13    3.194n ± 0%   3.281n ± 0%   +2.73% (p=0.000 n=50)
memmove/14    3.197n ± 0%   3.285n ± 0%   +2.77% (p=0.000 n=50)
memmove/15    3.198n ± 0%   3.282n ± 0%   +2.62% (p=0.000 n=50)
memmove/16    3.201n ± 0%   3.284n ± 0%   +2.61% (p=0.000 n=50)
memmove/17    3.564n ± 0%   3.320n ± 0%   -6.86% (p=0.000 n=50)
memmove/18    3.572n ± 0%   3.313n ± 0%   -7.25% (p=0.000 n=50)
memmove/19    3.572n ± 0%   3.325n ± 0%   -6.94% (p=0.000 n=50)
memmove/20    3.575n ± 0%   3.319n ± 0%   -7.15% (p=0.000 n=50)
memmove/21    3.578n ± 0%   3.327n ± 0%   -7.03% (p=0.000 n=50)
memmove/22    3.581n ± 0%   3.330n ± 0%   -7.01% (p=0.000 n=50)
memmove/23    3.582n ± 0%   3.354n ± 1%   -6.37% (p=0.000 n=50)
memmove/24    3.587n ± 0%   3.347n ± 1%   -6.71% (p=0.000 n=50)
memmove/25    3.591n ± 0%   3.320n ± 0%   -7.55% (p=0.000 n=50)
memmove/26    3.593n ± 0%   3.348n ± 0%   -6.82% (p=0.000 n=50)
memmove/27    3.596n ± 0%   3.346n ± 0%   -6.94% (p=0.000 n=50)
memmove/28    3.597n ± 0%   3.357n ± 0%   -6.67% (p=0.000 n=50)
memmove/29    3.601n ± 0%   3.340n ± 0%   -7.23% (p=0.000 n=50)
memmove/30    3.602n ± 0%   3.345n ± 0%   -7.12% (p=0.000 n=50)
memmove/31    3.608n ± 0%   3.357n ± 0%   -6.94% (p=0.000 n=50)
memmove/32    3.605n ± 0%   3.352n ± 0%   -7.01% (p=0.000 n=50)
memmove/33    4.128n ± 1%   3.829n ± 0%   -7.23% (p=0.000 n=50)
memmove/34    4.149n ± 0%   3.836n ± 0%   -7.54% (p=0.000 n=50)
memmove/35    4.134n ± 0%   3.839n ± 0%   -7.15% (n=50)
memmove/36    4.151n ± 0%   3.842n ± 0%   -7.45% (n=50)
memmove/37    4.152n ± 0%   3.841n ± 0%   -7.49% (p=0.000 n=50)
memmove/38    4.159n ± 0%   3.844n ± 0%   -7.58% (p=0.000 n=50)
memmove/39    4.165n ± 0%   3.841n ± 0%   -7.78% (p=0.000 n=50)
memmove/40    4.162n ± 0%   3.837n ± 0%   -7.81% (p=0.000 n=50)
memmove/41    4.161n ± 0%   3.845n ± 0%   -7.58% (p=0.000 n=50)
memmove/42    4.164n ± 0%   3.851n ± 0%   -7.53% (p=0.000 n=50)
memmove/43    4.165n ± 0%   3.843n ± 0%   -7.74% (p=0.000 n=50)
memmove/44    4.175n ± 0%   3.847n ± 0%   -7.83% (p=0.000 n=50)
memmove/45    4.170n ± 0%   3.849n ± 0%   -7.70% (p=0.000 n=50)
memmove/46    4.175n ± 0%   3.850n ± 0%   -7.79% (p=0.000 n=50)
memmove/47    4.180n ± 0%   3.851n ± 0%   -7.87% (p=0.000 n=50)
memmove/48    4.178n ± 0%   3.852n ± 0%   -7.81% (p=0.000 n=50)
memmove/49    4.175n ± 0%   3.851n ± 0%   -7.76% (n=50)
memmove/50    4.178n ± 0%   3.855n ± 0%   -7.73% (p=0.000 n=50)
memmove/51    4.190n ± 0%   3.859n ± 0%   -7.91% (p=0.000 n=50)
memmove/52    4.188n ± 0%   3.859n ± 0%   -7.84% (p=0.000 n=50)
memmove/53    4.191n ± 0%   3.863n ± 0%   -7.82% (p=0.000 n=50)
memmove/54    4.192n ± 0%   3.860n ± 0%   -7.91% (p=0.000 n=50)
memmove/55    4.192n ± 0%   3.869n ± 0%   -7.70% (p=0.000 n=50)
memmove/56    4.204n ± 0%   3.866n ± 0%   -8.05% (p=0.000 n=50)
memmove/57    4.198n ± 0%   3.864n ± 0%   -7.95% (p=0.000 n=50)
memmove/58    4.202n ± 0%   3.865n ± 0%   -8.02% (p=0.000 n=50)
memmove/59    4.208n ± 0%   3.868n ± 0%   -8.09% (p=0.000 n=50)
memmove/60    4.205n ± 0%   3.873n ± 0%   -7.89% (p=0.000 n=50)
memmove/61    4.212n ± 0%   3.872n ± 0%   -8.08% (p=0.000 n=50)
memmove/62    4.214n ± 0%   3.870n ± 0%   -8.16% (p=0.000 n=50)
memmove/63    4.215n ± 0%   3.877n ± 0%   -8.02% (p=0.000 n=50)
memmove/64    4.217n ± 0%   3.881n ± 0%   -7.99% (p=0.000 n=50)
memmove/65    4.990n ± 0%   4.683n ± 0%   -6.15% (p=0.000 n=50)
memmove/66    5.022n ± 0%   4.719n ± 0%   -6.03% (p=0.000 n=50)
memmove/67    5.030n ± 0%   4.725n ± 0%   -6.07% (p=0.000 n=50)
memmove/68    5.035n ± 0%   4.724n ± 0%   -6.18% (p=0.000 n=50)
memmove/69    5.030n ± 0%   4.725n ± 0%   -6.07% (p=0.000 n=50)
memmove/70    5.040n ± 0%   4.728n ± 0%   -6.19% (p=0.000 n=50)
memmove/71    5.053n ± 0%   4.728n ± 0%   -6.43% (p=0.000 n=50)
memmove/72    5.050n ± 0%   4.732n ± 0%   -6.29% (p=0.000 n=50)
memmove/73    5.049n ± 0%   4.733n ± 0%   -6.24% (p=0.000 n=50)
memmove/74    5.054n ± 0%   4.734n ± 0%   -6.34% (p=0.000 n=50)
memmove/75    5.063n ± 0%   4.736n ± 0%   -6.46% (p=0.000 n=50)
memmove/76    5.046n ± 0%   4.741n ± 0%   -6.04% (p=0.000 n=50)
memmove/77    5.057n ± 0%   4.741n ± 0%   -6.25% (p=0.000 n=50)
memmove/78    5.077n ± 0%   4.739n ± 0%   -6.65% (p=0.000 n=50)
memmove/79    5.074n ± 0%   4.746n ± 0%   -6.46% (p=0.000 n=50)
memmove/80    5.085n ± 0%   4.747n ± 0%   -6.65% (p=0.000 n=50)
memmove/81    5.077n ± 0%   4.735n ± 0%   -6.74% (p=0.000 n=50)
memmove/82    5.087n ± 0%   4.747n ± 0%   -6.68% (p=0.000 n=50)
memmove/83    5.087n ± 0%   4.754n ± 0%   -6.56% (p=0.000 n=50)
memmove/84    5.096n ± 0%   4.753n ± 0%   -6.73% (p=0.000 n=50)
memmove/85    5.082n ± 0%   4.749n ± 0%   -6.55% (p=0.000 n=50)
memmove/86    5.103n ± 0%   4.752n ± 0%   -6.87% (p=0.000 n=50)
memmove/87    5.096n ± 0%   4.760n ± 0%   -6.61% (p=0.000 n=50)
memmove/88    5.099n ± 0%   4.765n ± 0%   -6.55% (p=0.000 n=50)
memmove/89    5.104n ± 0%   4.757n ± 0%   -6.79% (p=0.000 n=50)
memmove/90    5.117n ± 0%   4.767n ± 0%   -6.84% (p=0.000 n=50)
memmove/91    5.100n ± 0%   4.766n ± 0%   -6.54% (p=0.000 n=50)
memmove/92    5.103n ± 0%   4.763n ± 0%   -6.67% (p=0.000 n=50)
memmove/93    5.115n ± 0%   4.772n ± 0%   -6.71% (p=0.000 n=50)
memmove/94    5.117n ± 0%   4.769n ± 0%   -6.80% (p=0.000 n=50)
memmove/95    5.131n ± 0%   4.775n ± 0%   -6.94% (p=0.000 n=50)
memmove/96    5.129n ± 0%   4.772n ± 0%   -6.97% (p=0.000 n=50)
memmove/97    5.130n ± 0%   4.764n ± 0%   -7.13% (p=0.000 n=50)
memmove/98    5.134n ± 0%   4.780n ± 0%   -6.89% (p=0.000 n=50)
memmove/99    5.141n ± 0%   4.780n ± 0%   -7.03% (p=0.000 n=50)
memmove/100   5.141n ± 0%   4.780n ± 0%   -7.02% (p=0.000 n=50)
memmove/101   5.150n ± 0%   4.782n ± 0%   -7.14% (p=0.000 n=50)
memmove/102   5.150n ± 0%   4.790n ± 0%   -6.99% (p=0.000 n=50)
memmove/103   5.156n ± 0%   4.788n ± 0%   -7.14% (n=50)
memmove/104   5.157n ± 0%   4.793n ± 0%   -7.05% (p=0.000 n=50)
memmove/105   5.147n ± 0%   4.791n ± 0%   -6.90% (p=0.000 n=50)
memmove/106   5.167n ± 0%   4.793n ± 0%   -7.23% (p=0.000 n=50)
memmove/107   5.165n ± 0%   4.801n ± 0%   -7.06% (p=0.000 n=50)
memmove/108   5.173n ± 0%   4.800n ± 0%   -7.21% (p=0.000 n=50)
memmove/109   5.173n ± 0%   4.797n ± 0%   -7.27% (p=0.000 n=50)
memmove/110   5.171n ± 0%   4.808n ± 0%   -7.01% (p=0.000 n=50)
memmove/111   5.180n ± 0%   4.799n ± 0%   -7.36% (p=0.000 n=50)
memmove/112   5.185n ± 0%   4.812n ± 0%   -7.19% (p=0.000 n=50)
memmove/113   5.187n ± 0%   4.797n ± 0%   -7.53% (p=0.000 n=50)
memmove/114   5.183n ± 0%   4.809n ± 0%   -7.21% (n=50)
memmove/115   5.193n ± 0%   4.811n ± 0%   -7.36% (p=0.000 n=50)
memmove/116   5.196n ± 0%   4.815n ± 0%   -7.32% (p=0.000 n=50)
memmove/117   5.199n ± 0%   4.816n ± 0%   -7.37% (p=0.000 n=50)
memmove/118   5.198n ± 0%   4.811n ± 0%   -7.45% (p=0.000 n=50)
memmove/119   5.203n ± 0%   4.818n ± 0%   -7.40% (p=0.000 n=50)
memmove/120   5.195n ± 0%   4.823n ± 0%   -7.16% (p=0.000 n=50)
memmove/121   5.203n ± 0%   4.812n ± 0%   -7.51% (p=0.000 n=50)
memmove/122   5.204n ± 0%   4.818n ± 0%   -7.42% (n=50)
memmove/123   5.202n ± 0%   4.822n ± 0%   -7.31% (p=0.000 n=50)
memmove/124   5.216n ± 0%   4.823n ± 0%   -7.54% (p=0.000 n=50)
memmove/125   5.227n ± 0%   4.823n ± 0%   -7.72% (p=0.000 n=50)
memmove/126   5.235n ± 0%   4.830n ± 0%   -7.74% (p=0.000 n=50)
memmove/127   5.237n ± 0%   4.833n ± 0%   -7.72% (p=0.000 n=50)
memmove/128   5.241n ± 0%   4.832n ± 0%   -7.81% (p=0.000 n=50)
memmove/129   6.460n ± 0%   5.858n ± 0%   -9.31% (p=0.000 n=50)
memmove/130   7.539n ± 0%   6.634n ± 0%  -12.00% (p=0.000 n=50)
memmove/131   7.542n ± 0%   6.623n ± 0%  -12.18% (p=0.000 n=50)
memmove/132   7.527n ± 0%   6.667n ± 1%  -11.43% (p=0.000 n=50)
memmove/133   7.521n ± 0%   6.631n ± 0%  -11.83% (p=0.000 n=50)
memmove/134   7.531n ± 0%   6.642n ± 0%  -11.81% (p=0.000 n=50)
memmove/135   7.541n ± 0%   6.692n ± 1%  -11.25% (p=0.000 n=50)
memmove/136   7.549n ± 0%   6.657n ± 0%  -11.81% (p=0.000 n=50)
memmove/137   7.544n ± 0%   6.646n ± 0%  -11.90% (p=0.000 n=50)
memmove/138   7.557n ± 0%   6.673n ± 1%  -11.70% (p=0.000 n=50)
memmove/139   7.545n ± 0%   6.654n ± 0%  -11.81% (n=50)
memmove/140   7.559n ± 0%   6.680n ± 1%  -11.63% (p=0.000 n=50)
memmove/141   7.560n ± 0%   6.664n ± 0%  -11.85% (p=0.000 n=50)
memmove/142   7.556n ± 0%   6.679n ± 0%  -11.62% (p=0.000 n=50)
memmove/143   7.570n ± 0%   6.683n ± 1%  -11.71% (p=0.000 n=50)
memmove/144   7.586n ± 0%   6.683n ± 0%  -11.91% (p=0.000 n=50)
memmove/145   7.593n ± 0%   6.665n ± 0%  -12.22% (p=0.000 n=50)
memmove/146   7.591n ± 0%   6.665n ± 0%  -12.20% (p=0.000 n=50)
memmove/147   7.598n ± 0%   6.665n ± 0%  -12.27% (p=0.000 n=50)
memmove/148   7.598n ± 0%   6.670n ± 0%  -12.21% (p=0.000 n=50)
memmove/149   7.593n ± 0%   6.691n ± 0%  -11.88% (p=0.000 n=50)
memmove/150   7.625n ± 0%   6.713n ± 1%  -11.97% (p=0.000 n=50)
memmove/151   7.603n ± 0%   6.710n ± 1%  -11.74% (p=0.000 n=50)
memmove/152   7.613n ± 0%   6.701n ± 1%  -11.97% (p=0.000 n=50)
memmove/153   7.595n ± 0%   6.710n ± 0%  -11.65% (p=0.000 n=50)
memmove/154   7.614n ± 0%   6.721n ± 0%  -11.74% (p=0.000 n=50)
memmove/155   7.615n ± 0%   6.709n ± 0%  -11.89% (p=0.000 n=50)
memmove/156   7.613n ± 0%   6.693n ± 0%  -12.08% (p=0.000 n=50)
memmove/157   7.628n ± 0%   6.708n ± 0%  -12.05% (p=0.000 n=50)
memmove/158   7.629n ± 0%   6.706n ± 0%  -12.10% (p=0.000 n=50)
memmove/159   7.639n ± 0%   6.724n ± 0%  -11.98% (p=0.000 n=50)
memmove/160   7.619n ± 0%   6.702n ± 0%  -12.04% (p=0.000 n=50)
memmove/161   7.653n ± 0%   6.698n ± 0%  -12.49% (p=0.000 n=50)
memmove/162   8.104n ± 0%   7.140n ± 1%  -11.89% (p=0.000 n=50)
memmove/163   8.141n ± 0%   7.187n ± 1%  -11.72% (p=0.000 n=50)
memmove/164   8.154n ± 0%   7.107n ± 0%  -12.84% (p=0.000 n=50)
memmove/165   8.143n ± 0%   7.117n ± 0%  -12.59% (p=0.000 n=50)
memmove/166   8.176n ± 0%   7.110n ± 0%  -13.04% (p=0.000 n=50)
memmove/167   8.194n ± 0%   7.168n ± 1%  -12.52% (p=0.000 n=50)
memmove/168   8.214n ± 0%   7.188n ± 1%  -12.50% (p=0.000 n=50)
memmove/169   8.220n ± 0%   7.242n ± 1%  -11.90% (p=0.000 n=50)
memmove/170   8.228n ± 0%   7.244n ± 1%  -11.96% (p=0.000 n=50)
memmove/171   8.263n ± 0%   7.184n ± 0%  -13.06% (p=0.000 n=50)
memmove/172   8.259n ± 0%   7.325n ± 1%  -11.31% (p=0.000 n=50)
memmove/173   8.271n ± 0%   7.225n ± 0%  -12.65% (p=0.000 n=50)
memmove/174   8.284n ± 0%   7.287n ± 1%  -12.04% (p=0.000 n=50)
memmove/175   8.289n ± 0%   7.282n ± 1%  -12.15% (p=0.000 n=50)
memmove/176   8.309n ± 0%   7.328n ± 1%  -11.81% (p=0.000 n=50)
memmove/177   8.317n ± 0%   7.264n ± 1%  -12.67% (p=0.000 n=50)
memmove/178   8.302n ± 0%   7.342n ± 1%  -11.57% (p=0.000 n=50)
memmove/179   8.309n ± 0%   7.357n ± 1%  -11.45% (p=0.000 n=50)
memmove/180   8.304n ± 0%   7.318n ± 1%  -11.87% (p=0.000 n=50)
memmove/181   8.312n ± 0%   7.363n ± 1%  -11.42% (p=0.000 n=50)
memmove/182   8.315n ± 0%   7.320n ± 1%  -11.96% (p=0.000 n=50)
memmove/183   8.330n ± 0%   7.286n ± 1%  -12.53% (p=0.000 n=50)
memmove/184   8.310n ± 0%   7.324n ± 1%  -11.86% (p=0.000 n=50)
memmove/185   8.303n ± 0%   7.267n ± 1%  -12.47% (p=0.000 n=50)
memmove/186   8.287n ± 0%   7.312n ± 1%  -11.76% (p=0.000 n=50)
memmove/187   8.298n ± 0%   7.395n ± 2%  -10.88% (p=0.000 n=50)
memmove/188   8.296n ± 0%   7.339n ± 1%  -11.54% (p=0.000 n=50)
memmove/189   8.306n ± 0%   7.299n ± 1%  -12.12% (p=0.000 n=50)
memmove/190   8.281n ± 0%   7.309n ± 1%  -11.74% (p=0.000 n=50)
memmove/191   8.299n ± 0%   7.282n ± 1%  -12.26% (p=0.000 n=50)
memmove/192   8.281n ± 0%   7.335n ± 1%  -11.41% (p=0.000 n=50)
memmove/193   8.299n ± 0%   7.325n ± 1%  -11.74% (p=0.000 n=50)
memmove/194   8.641n ± 0%   8.034n ± 0%   -7.02% (p=0.000 n=50)
memmove/195   8.667n ± 0%   8.073n ± 0%   -6.85% (p=0.000 n=50)
memmove/196   8.666n ± 0%   8.030n ± 0%   -7.34% (p=0.000 n=50)
memmove/197   8.660n ± 0%   8.096n ± 1%   -6.51% (p=0.000 n=50)
memmove/198   8.688n ± 0%   8.047n ± 0%   -7.39% (p=0.000 n=50)
memmove/199   8.678n ± 0%   8.061n ± 0%   -7.11% (p=0.000 n=50)
memmove/200   8.669n ± 0%   8.034n ± 0%   -7.32% (p=0.000 n=50)
memmove/201   8.692n ± 0%   8.061n ± 0%   -7.26% (p=0.000 n=50)
memmove/202   8.668n ± 0%   8.060n ± 0%   -7.02% (p=0.000 n=50)
memmove/203   8.687n ± 0%   8.066n ± 0%   -7.15% (p=0.000 n=50)
memmove/204   8.699n ± 0%   8.076n ± 0%   -7.16% (p=0.000 n=50)
memmove/205   8.676n ± 0%   8.085n ± 0%   -6.82% (p=0.000 n=50)
memmove/206   8.684n ± 0%   8.101n ± 1%   -6.71% (p=0.000 n=50)
memmove/207   8.725n ± 0%   8.099n ± 0%   -7.18% (p=0.000 n=50)
memmove/208   8.674n ± 0%   8.073n ± 0%   -6.92% (p=0.000 n=50)
memmove/209   8.697n ± 0%   8.088n ± 0%   -7.01% (p=0.000 n=50)
memmove/210   8.733n ± 0%   8.076n ± 0%   -7.53% (p=0.000 n=50)
memmove/211   8.732n ± 0%   8.104n ± 0%   -7.19% (p=0.000 n=50)
memmove/212   8.730n ± 0%   8.091n ± 0%   -7.32% (p=0.000 n=50)
memmove/213   8.728n ± 0%   8.100n ± 0%   -7.19% (p=0.000 n=50)
memmove/214   8.744n ± 1%   8.081n ± 1%   -7.57% (p=0.000 n=50)
memmove/215   8.734n ± 0%   8.150n ± 0%   -6.68% (p=0.000 n=50)
memmove/216   8.748n ± 0%   8.116n ± 0%   -7.23% (p=0.000 n=50)
memmove/217   8.751n ± 0%   8.129n ± 1%   -7.11% (p=0.000 n=50)
memmove/218   8.747n ± 0%   8.114n ± 0%   -7.23% (p=0.000 n=50)
memmove/219   8.733n ± 0%   8.159n ± 0%   -6.57% (p=0.000 n=50)
memmove/220   8.764n ± 0%   8.145n ± 0%   -7.06% (p=0.000 n=50)
memmove/221   8.764n ± 0%   8.142n ± 0%   -7.10% (p=0.000 n=50)
memmove/222   8.775n ± 0%   8.152n ± 0%   -7.10% (p=0.000 n=50)
memmove/223   8.771n ± 0%   8.143n ± 0%   -7.16% (p=0.000 n=50)
memmove/224   8.778n ± 0%   8.175n ± 1%   -6.87% (p=0.000 n=50)
memmove/225   8.794n ± 0%   8.138n ± 0%   -7.45% (p=0.000 n=50)
memmove/226   10.13n ± 0%   10.06n ± 0%   -0.71% (p=0.000 n=50)
memmove/227   10.14n ± 0%   10.08n ± 0%   -0.53% (p=0.000 n=50)
memmove/228   10.13n ± 0%   10.08n ± 0%   -0.56% (p=0.000 n=50)
memmove/229   10.17n ± 0%   10.11n ± 0%   -0.56% (p=0.000 n=50)
memmove/230   10.17n ± 0%   10.13n ± 0%   -0.38% (p=0.003 n=50)
memmove/231   10.16n ± 0%   10.12n ± 0%   -0.41% (p=0.001 n=50)
memmove/232   10.19n ± 0%   10.12n ± 0%   -0.67% (p=0.000 n=50)
memmove/233   10.21n ± 0%   10.14n ± 0%   -0.71% (p=0.000 n=50)
memmove/234   10.24n ± 0%   10.16n ± 0%   -0.79% (p=0.000 n=50)
memmove/235   10.24n ± 0%   10.16n ± 0%   -0.76% (p=0.000 n=50)
memmove/236   10.25n ± 0%   10.16n ± 0%   -0.81% (p=0.000 n=50)
memmove/237   10.24n ± 0%   10.17n ± 0%   -0.69% (p=0.000 n=50)
memmove/238   10.27n ± 0%   10.19n ± 0%   -0.79% (p=0.000 n=50)
memmove/239   10.29n ± 0%   10.19n ± 0%   -0.90% (p=0.000 n=50)
memmove/240   10.30n ± 0%   10.20n ± 0%   -0.95% (p=0.000 n=50)
memmove/241   10.29n ± 0%   10.20n ± 0%   -0.91% (p=0.000 n=50)
memmove/242   10.30n ± 0%   10.22n ± 0%   -0.80% (p=0.000 n=50)
memmove/243   10.32n ± 0%   10.23n ± 0%   -0.87% (p=0.000 n=50)
memmove/244   10.32n ± 0%   10.24n ± 0%   -0.74% (p=0.000 n=50)
memmove/245   10.33n ± 0%   10.23n ± 0%   -0.97% (p=0.000 n=50)
memmove/246   10.33n ± 0%   10.24n ± 0%   -0.92% (p=0.000 n=50)
memmove/247   10.31n ± 0%   10.24n ± 0%   -0.69% (p=0.000 n=50)
memmove/248   10.32n ± 0%   10.26n ± 0%   -0.55% (p=0.000 n=50)
memmove/249   10.33n ± 0%   10.28n ± 0%   -0.52% (p=0.000 n=50)
memmove/250   10.34n ± 0%   10.27n ± 0%   -0.66% (p=0.000 n=50)
memmove/251   10.32n ± 0%   10.27n ± 0%   -0.45% (p=0.000 n=50)
memmove/252   10.34n ± 0%   10.30n ± 0%   -0.39% (p=0.005 n=50)
memmove/253   10.33n ± 0%   10.27n ± 0%   -0.57% (p=0.000 n=50)
memmove/254   10.33n ± 0%   10.27n ± 0%   -0.54% (p=0.000 n=50)
memmove/255   10.34n ± 0%   10.29n ± 0%   -0.50% (p=0.002 n=50)
memmove/256   10.36n ± 0%   10.31n ± 0%   -0.44% (p=0.006 n=50)
memmove/257   10.33n ± 0%   10.29n ± 0%   -0.36% (p=0.004 n=50)
geomean       6.142n        5.696n        -7.26%
```
2023-10-24 16:05:27 +02:00
Joseph Huber
25bf1ae99b [libc] Enable remaining string functions on the GPU (#68346)
Summary:
We previously had to disable these string functions because they were
not compatible with the definitions coming from the GNU / host
environment. The GPU, when exporting its declarations, has a very
difficult requirement that it be compatible with the host environment as
both sides of the compilation need to agree on definitions and what's
present.

This patch more or less gives up an just copies the definitions as
expected by `glibc` if they are provided that way, otherwise we fall
back to the accepted way. This is the alternative solution to an
existing PR which instead disable's GCC's handling.
2023-10-23 13:16:20 -04:00
Hans Wennborg
e2fc68c3db Typos: 'maxium', 'minium' 2023-10-23 10:42:28 +02:00
Anton Rydahl
e774482c4c Fixed typo in GPU libm device library warning (#69752)
Correcting a small typo in the error message when the CUDA device libraries are not detected.
2023-10-20 12:17:26 -07:00
lntue
6d53fdeab4 [libc][NFC] Attempt to deflake gettimeofday_test. (#69719)
Only check if gettimeofday call succeeds.
2023-10-20 11:08:01 -04:00
lntue
ec10c36b07 [libc][NFC] Forcing data type in gettimeofday_test when comparing the diff. (#69652) 2023-10-19 19:49:59 -04:00
Joseph Huber
630037ede4 [libc] Partially implement 'rand' for the GPU (#66167)
Summary:
This patch partially implements the `rand` function on the GPU. This is
partial because the GPU currently doesn't support thread local storage
or static initializers. To implement this on the GPU. I use 1/8th of the
local / shared memory quota to treak the shared memory as thread local
storage. This is done by simply allocating enough storage for each
thread in the block and indexing into this based off of the thread id.
The downside to this is that it does not initialize `srand` correctly to
be `1` as the standard says, it is also wasteful. In the future we
should figure out a way to support TLS on the GPU so that this can be
completely common and less resource intensive.
2023-10-19 17:01:43 -04:00
Joseph Huber
a39215768b [libc] Rework the 'fgets' implementation on the GPU (#69635)
Summary:
The `fgets` function as implemented is not functional currently when
called with multiple threads. This is because we rely on reapeatedly
polling the character to detect EOF. This doesn't work when there are
multiple threads that may with to poll the characters. this patch pulls
out the logic into a standalone RPC call to handle this in a single
operation such that calling it from multiple threads functions as
expected. It also makes it less slow because we no longer make N RPC
calls for N characters.
2023-10-19 17:00:01 -04:00
Anton Rydahl
c73ad025b1 [libc][libm][GPU] Add missing vendor entrypoints to the GPU version of libm (#66034)
This patch populates the GPU version of `libm` with missing vendor entrypoints. The vendor math entrypoints are disabled by default but can be enabled with the CMake option `LIBC_GPU_VENDOR_MATH=ON`.
2023-10-19 12:24:50 -07:00
Alfred Persson Forsberg
67770cbb98 [libc][NFC] Fix features.h.def file header 2023-10-19 20:00:26 +02:00
alfredfo
f350532099 [libc] Fix accidental LIBC_NAMESPACE_clock_freq (#69620)
See-also: https://github.com/llvm/llvm-project/pull/69548
2023-10-19 19:39:02 +02:00
lntue
3fd5113cba [libc][math][NFC] Remove global scope constants declaration in math tests (#69558)
Clean up usage of `DECLARE_SPECIAL_CONSTANTS` in global scope.
2023-10-19 10:30:11 -04:00
alfredfo
d404130134 [libc] Fix accidental LIBC_NAMESPACE_syscall definition (#69548)
Building helloworld.c currently errors with "undefined symbol:
__llvm_libc_syscall"

See: https://github.com/llvm/llvm-project/pull/67032
2023-10-19 11:22:16 +02:00
alfredfo
74b0465fe9 [libc] Add simple features.h with implementation macro (#69402)
In the future this should probably be autogenerated so it defines
library version.

See: Discussion in #libc
https://discord.com/channels/636084430946959380/636732994891284500/1163979080979460176
2023-10-19 04:08:13 +02:00
Joseph Huber
ddc30ff802 [libc] Implement the 'ungetc' function on the GPU (#69248)
Summary:
This function follows closely with the pattern of all the other
functions. That is, making a new opcode and forwarding the call to the
host. However, this also required modifying the test somewhat. It seems
that not all `libc` implementations follow the same error rules as are
tested here, and it is not explicit in the standard, so we simply
disable these EOF checks when targeting the GPU.
2023-10-17 13:02:31 -05:00
michaelrj-google
8a47ad4b67 [libc] Add simple long double to printf float fuzz (#68449)
Recent testing has uncovered some hard-to-find bugs in printf's long
double support. This patch adds an extra long double path to the fuzzer
with minimal extra effort. While a more thorough long double fuzzer
would be useful, it would need to handle the non-standard cases of 80
bit long doubles such as unnormal and pseudo-denormal numbers. For that
reason, a standalone long double fuzzer is left for future development.
2023-10-16 13:32:34 -07:00
Samira Bazuzi
b5c2fa14ea [libc] Mark operator== const to avoid ambiguity in C++20. (#68805)
C++20 will automatically generate an operator== with reversed operand
order, which is ambiguous with the written operator== when one argument
is marked const and the other isn't.

This operator currently triggers -Wambiguous-reversed-operator at usage
site libc/test/UnitTest/PrintfMatcher.cpp:28.
2023-10-11 23:59:13 -04:00
Joseph Huber
9bcf9dc98a [libc] Fix missing warp sync for the NVPTX assert
Summary:
The implementation of `assert` has an if statement so that only the
first thread in the warp prints the assertion. On modern NVPTX
architecture, this can be printed out of order with the abort call. This
would lead to only a portion of the message being printed and then
exiting the program. By adding a mandatory warp sync we force the full
string to be printed before we continue to the abort.
2023-10-10 12:50:37 -05:00
Joseph Huber
fa23a2396b [libc] Fix linking of AMDGPU device runtime control constants for math (#65676)
Summary:
Currently, `libc` temporarily provides math by linking against existing
vendor implementations. To use the AMDGPU DeviceRTL we need to define a
handful of control constants that alter behaviour for architecture
specific things. Previously these were marked `extern const` because
they must be present when we link-in the vendor bitcode library.
However, this causes linker errors if more than one math function was
used.

This patch fixes the issue by marking these functions as used and inline
on top of being external. This means that they are linkable, but it
gives us `linkonce_odr` semantics. The downside is that these globals
won't be optimized out, but it allows us to perform constant propagation
on them unlike using `weak`.
2023-10-06 21:50:35 -05:00
Joseph Huber
4cb6c1c7cb [libc] Enable missing memory tests on the GPU (#68111)
Summary:
There were a few tests that weren't enabled on the GPU. This is because
the logic caused them to be skipped as we don't use CPU featured on the
host. This also disables the logic making multiple versions of the
memory functions.
2023-10-06 08:27:36 -05:00
tnv01
28245b4ecb [libc] Add x86-64 stack protector support. 2023-10-04 14:18:23 -07:00
michaelrj-google
bfcfc2a6d4 [libc] Fix typo in long double negative block (#68243)
The long double version of float to string's get_negative_block had a
bug in table mode. In table mode, one of the tables is named
"MIN_BLOCK_2" and it stores the number of blocks that are all zeroes
before the digits start for a given index. The check for long doubles
was incorrectly "block_index <= MIN_BLOCK_2[idx]" when it should be
"block_index < MIN_BLOCK_2[idx]" (without the equal sign). This bug
caused an off-by-one error for some long double values. This patch fixes
the bug and adds tests to ensure it doesn't regress.
2023-10-04 13:00:48 -07:00
Mikhail R. Gadelha
714b4c82bb [libc][NFC] Fix -Wdangling-else when compiling libc with gcc >= 7 (#67833)
Explicit braces were added to fix the "suggest explicit braces to avoid
ambiguous ‘else’" warning since the current solution (switch (0) case 0:
default:) doesn't work since gcc 7 (see
https://github.com/google/googletest/issues/1119)

gcc 13 generates about 5000 of these warnings when building libc without
this patch.
2023-10-04 11:44:42 -04:00
Joseph Huber
452fa6b86d [libc] Change the GPU to use builtin memory functions (#68003)
Summary:
The GPU build is special in the sense that we always know that
up-to-date `clang` is always going to be the compiler. This allows us to
rely directly on builtins, which allow us to push a lot of this
complexity into the backend. Backend implementations are favored on
the GPU because it allows us to do a lot more target specific
optimizations. This patch changes over the common memory functions to
use builtin versions when building for AMDGPU or NVPTX.
2023-10-04 07:02:55 -05:00
Mikhail R. Gadelha
824b1677a4 [libc][NFC] Fix missing field 'tm_isdst' initializer warning (#67837)
This patch fixes several warnings thrown by clang about an uninitialized
member of struct tm, tm_isdst.

Weirdly, gcc doesn't complain about it, probably this member is never
read in the tests.
2023-10-02 19:32:55 -04:00
Mikhail R. Gadelha
8fc87f54a8 [libc][NFC] Couple of small warning fixes (#67847)
This patch fixes a couple of warnings when compiling with gcc 13:

* CPP/type_traits_test.cpp: 'apply' overrides a member function but is
not marked 'override'
* UnitTest/LibcTest.cpp:98: control reaches end of non-void function
* MPFRWrapper/MPFRUtils.cpp:75: control reaches end of non-void function
* smoke/FrexpTest.h:92: backslash-newline at end of file
* __support/float_to_string.h:118: comparison of unsigned expression in ‘>= 0’ is always true
* test/src/__support/CPP/bitset_test.cpp:197: comparison of unsigned expression in ‘>= 0’ is always true

---------

Signed-off-by: Mikhail R. Gadelha <mikhail@igalia.com>
2023-10-02 19:29:26 -04:00
Joseph Huber
f88f090a2e [libc] Correct 'memrchr' definition and re-enable on GPU (#67850)
Summary:
This was disabled on the GPU because it conflicted with the definition
in `glibc`. According to information online and in the `glibc`
implementation, the first argument should be a `const void *`. Fixing
this resolves the problem when exporting this to offloading languages.
2023-09-29 18:22:00 -05:00
Joseph Huber
e0b702ffc2 [libc] Fix nanosleep definition in the posix spec (#67855)
Summary:
The POSIX standard expects the first argument to this function to be
constant, e.g. https://man7.org/linux/man-pages/man2/nanosleep.2.html.
This fixes that problem and also corrects an obvious problem with
enabling this for offloading.
2023-09-29 17:35:10 -05:00
Joseph Huber
ce38cbb13b [libc][NFC] Adjust the libc init / fini array test
Summary:
The NVPTX backend is picky about the definitions of functions. Because
we call these functions with these arguments it can cause some problems
when it goes through the backend. This was observed in a different test
for `printf` that hasn't been landed yet. Also adjust the priority.
2023-09-29 13:22:02 -05:00
Joseph Huber
22ebf1e9b7 [libc][Obvious] Do not pass 'nolibc' and other flags to the GPU build
Summary:
Previously this code was applied to the integration tests but did not
copy the logic that stopped this from being passed to the GPU build.
Copy the full line to avoid the warnings and prevent any libraries from
being included.
2023-09-29 12:57:02 -05:00
Mikhail R. Gadelha
dbceb1d936 [libc] Fix unused variable in fputc test (#67830)
This is probably a copy-and-paste error and the variable 'more' was left
unused.
2023-09-29 12:31:40 -04:00
lntue
da28593d71 [libc][math] Implement double precision expm1 function correctly rounded for all rounding modes. (#67048)
Implementing expm1 function for double precision based on exp function
algorithm:

- Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where:
  * hi is an integer
  * mid1 * 2^-6 is an integer
  * mid2 * 2^-12 is an integer
  * |lo| < 2^-13 + 2^-30
- Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi *
(2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) )
- We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of
(e^lo - 1) / lo in double precision
- If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of
(e^lo - 1) / lo in double double precision
- If the Ziv accuracy test still fails, we re-evaluate everything in
128-bit precision.
2023-09-28 16:43:15 -04:00
Joseph Huber
cc2445589d [libc] Fix wrapper headers for some ctype macros and C++ decls
Summary:
These wrapper headers need to work around things in the standard
headers. The existing workarounds didn't correctly handle the macros for
`iscascii` and `toascii`. Additionally, `memrchr` can't be used because
it has a different declaration for C++ mode. Fix this so it can be
compiled.
2023-09-28 10:00:34 -05:00
Joseph Huber
1a5d3b6cda [libc] Scan the ports more fairly in the RPC server (#66680)
Summary:
Currently, we use the RPC server to respond to different ports which
each contain a request from some client thread wishing to do work on the
server. This scan starts at zero and continues until its checked all
ports at which point it resets. If we find an active port, we service it
and then restart the search.

This is bad for two reasons. First, it means that we will always bias
the lower ports. If a thread grabs a high port it will be stuck for a
very long time until all the other work is done. Second, it means that
the `handle_server` function can technically run indefinitely as long as
the client is always pushing new work. Because the OpenMP implementation
uses the user thread to service the kernel, this means that it could be
stalled with another asyncrhonous device's kernels.

This patch addresses this by making the server restart at the next port
over. This means we will always do a full scan of the ports before
quitting.
2023-09-26 16:09:48 -05:00
Joseph Huber
6273b6d9dc [libc] Change RPC opcode enum definition (#67439)
Summary:
This enum previously manually specified the value. This just made it
unnecessarily difficult to add new ones without changing everything.
This patch also makes it compatible with C by removing the `:`
annotation and instead using the `LAST` method.
2023-09-26 15:24:28 -05:00
Joseph Huber
2b7227db1e [libc] Fix RPC server global after mass replace of __llvm_libc
Summary:
This variable needs a reserved name starting with `__`. It was
mistakenly changed with a mass replace. It happened to work because the
tests still picked up the associated symbol, but it just became a bad
name because it's not reserved anymore.
2023-09-26 14:28:48 -05:00
Siva Chandra
f2c9fe452f [libc][NFC] Fix delete operator linkage names after switch to LIBC_NAMESPACE. (#67475)
The name __llvm_libc was mass-replaced with LIBC_NAMESPACE which ended
up changing the "__llvm_libc" prefix of the delete operator linkage names to
"LIBC_NAMESPACE". This change corrects it by changing the namespace prefix
to "__llvm_libc_<version info>".
2023-09-26 11:53:14 -07:00
Siva Chandra
425defd810 [libc][Obvious] Remove the previous ErrnoSetterMatcher target. (#67469)
A target still depending on the old target has been updated.
2023-09-26 11:01:21 -07:00
Siva Chandra
3bfd6a7521 [libc][NFC] Add compile options only to the header libraries which use them. (#67447)
Other libraries dependent on these libraries will automatically inherit
those compile options. This change in particular affects the compile
option "-DLIBC_COPT_STDIO_USE_SYSTEM_FILE".
2023-09-26 09:20:00 -07:00
Mikhail R. Gadelha
e3087c4b8c [libc] Start to refactor riscv platform abstraction to support both 32 and 64 bits versions
This patch enables the compilation of libc for rv32 by unifying the
current rv64 and rv32 implementation into a single rv implementation.

We updated the cmake file to match the new riscv32 arch and force
LIBC_TARGET_ARCHITECTURE to be "riscv" whenever we find "riscv32" or
"riscv64". This is required as LIBC_TARGET_ARCHITECTURE is used in the
path for several platform specific implementations.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D148797
2023-09-26 12:32:25 -03:00
Siva Chandra
599eadec28 [libc] Propagate printf config options from a single config header library. (#66979)
printf_core.parser is not yet updated to use the printf config options. It
does not use them currently anyway and the corresponding parser_test
should be updated to respect the config options.
2023-09-26 08:16:31 -07:00
Siva Chandra
aecb58005c [libc][NFC] Remove an inappropriate -ffreestanding arg to memory_utils test. (#67435) 2023-09-26 08:04:08 -07:00
Joseph Huber
1b8c8155cc [libc][Obvious] Fix incorrect filepath for ftell.h header
Summary:
The previous patch moved the location of this CMake line but didn't
update the header. Fix it.
2023-09-26 10:02:20 -05:00
Joseph Huber
7ac8e26fc7 [libc] Implement fseek, fflush, and ftell on the GPU (#67160)
Summary:
This patch adds the necessary entrypoints to handle the `fseek`,
`fflush`, and `ftell` functions. These are all very straightfoward, we
simply make RPC calls to the associated function on the other end.
Implementing it this way allows us to more or less borrow the state of
the stream from the server as we intentionally maintain no internal
state on the GPU device. However, this does not implement the `errno`
functinality so that must be ignored.
2023-09-26 09:46:46 -05:00