[v5,1/1] x86-64: Add vector acos/acosf implementation to libmvec (original) (raw)

Message ID 20211219171809.2912282-2-skpgkp2@gmail.com (mailing list archive)
State Superseded
Headers Return-Path: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org X-Original-To: patchwork@sourceware.org Delivered-To: patchwork@sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 96D123858430 for patchwork@sourceware.org; Sun, 19 Dec 2021 17:19:20 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 96D123858430 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1639934360; bh=lkhhQEdhlTj66dE2e9LIAhq372P9MVcHjzicZb4qQ7s=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=FWw0z3eflIZHmbPJuEFOEWXgLRais/OO293tfEHJ1sztVAT7fkR61vz/cdim0GA+v woh8LoL9PK5PeSa7wMsNL5q5X334Hs6U8Kb4w+UzRRbBtuKFun2sIO0WJu9iHeMxUp jAFYK/hoBXvVbevF7foRGxirn7eZepfiPjY1U+70= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 1DA393858401 for libc-alpha@sourceware.org; Sun, 19 Dec 2021 17🔞10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1DA393858401 X-IronPort-AV: E=McAfee;i="6200,9189,10203"; a="220051090" X-IronPort-AV: E=Sophos;i="5.88,218,1635231600"; d="scan'208";a="220051090" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2021 09🔞09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,218,1635231600"; d="scan'208";a="569604264" Received: from scymds02.sc.intel.com ([10.82.73.244]) by fmsmga008.fm.intel.com with ESMTP; 19 Dec 2021 09🔞09 -0800 Received: from gskx-1.sc.intel.com (gskx-1.sc.intel.com [172.25.149.211]) by scymds02.sc.intel.com with ESMTP id 1BJHI9kZ004270; Sun, 19 Dec 2021 09🔞09 -0800 To: libc-alpha@sourceware.org Subject: [PATCH v5 1/1] x86-64: Add vector acos/acosf implementation to libmvec Date: Sun, 19 Dec 2021 09🔞09 -0800 Message-Id: 20211219171809.2912282-2-skpgkp2@gmail.com X-Mailer: git-send-email 2.31.1 In-Reply-To: 20211219171809.2912282-1-skpgkp2@gmail.com References: CAFUsyfLR9v+cAG1Xmx6rTKE9M4j+8WL=d3oHyi5k-8UwD+hjFw@mail.gmail.com 20211219171809.2912282-1-skpgkp2@gmail.com MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, HK_RANDOM_ENVFROM, HK_RANDOM_FROM, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_SHORT, KAM_STOCKGEN, NML_ADSP_CUSTOM_MED, SPF_HELO_PASS, SPF_SOFTFAIL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org> List-Unsubscribe: https://sourceware.org/mailman/options/libc-alpha, mailto:libc-alpha-request@sourceware.org?subject=unsubscribe List-Archive: https://sourceware.org/pipermail/libc-alpha/ List-Post: mailto:libc-alpha@sourceware.org List-Help: mailto:libc-alpha-request@sourceware.org?subject=help List-Subscribe: https://sourceware.org/mailman/listinfo/libc-alpha, mailto:libc-alpha-request@sourceware.org?subject=subscribe From: Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org Reply-To: Sunil K Pandey skpgkp2@gmail.com Cc: andrey.kolesov@intel.com, marius.cornea@intel.com Errors-To: libc-alpha-bounces+patchwork=sourceware.org@sourceware.org Sender: "Libc-alpha" libc-alpha-bounces+patchwork=sourceware.org@sourceware.org
Series Add vector math function acos/acosf to libmvec | [v5,0/1] Add vector math function acos/acosf to libmvec [v5,1/1] x86-64: Add vector acos/acosf implementation to libmvec

Checks

Context Check Description
dj/TryBot-apply_patch success Patch applied to master at the time it was sent
dj/TryBot-32bit success Build for i686

Commit Message

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + .../x86/fpu/finclude/math-vector-fortran.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 4 + sysdeps/x86_64/fpu/libm-test-ulps | 20 ++ .../fpu/multiarch/ifunc-mathvec-avx512-skx.h | 39 +++ .../fpu/multiarch/svml_d_acos2_core-sse2.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos2_core.c | 27 ++ .../fpu/multiarch/svml_d_acos2_core_sse4.S | 293 +++++++++++++++++ .../fpu/multiarch/svml_d_acos4_core-sse.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos4_core.c | 27 ++ .../fpu/multiarch/svml_d_acos4_core_avx2.S | 273 ++++++++++++++++ .../fpu/multiarch/svml_d_acos8_core-avx2.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos8_core.c | 27 ++ .../fpu/multiarch/svml_d_acos8_core_avx512.S | 298 ++++++++++++++++++ .../fpu/multiarch/svml_s_acosf16_core-avx2.S | 20 ++ .../fpu/multiarch/svml_s_acosf16_core.c | 28 ++ .../multiarch/svml_s_acosf16_core_avx512.S | 262 +++++++++++++++ .../fpu/multiarch/svml_s_acosf4_core-sse2.S | 20 ++ .../x86_64/fpu/multiarch/svml_s_acosf4_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf4_core_sse4.S | 260 +++++++++++++++ .../fpu/multiarch/svml_s_acosf8_core-sse.S | 20 ++ .../x86_64/fpu/multiarch/svml_s_acosf8_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf8_core_avx2.S | 252 +++++++++++++++ sysdeps/x86_64/fpu/svml_d_acos2_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S | 25 ++ sysdeps/x86_64/fpu/svml_d_acos8_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf16_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S | 25 ++ .../x86_64/fpu/test-double-libmvec-acos-avx.c | 1 + .../fpu/test-double-libmvec-acos-avx2.c | 1 + .../fpu/test-double-libmvec-acos-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-acos.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-acosf-avx.c | 1 + .../fpu/test-float-libmvec-acosf-avx2.c | 1 + .../fpu/test-float-libmvec-acosf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-acosf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 51 files changed, 2251 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c

Comments

On Sun, Dec 19, 2021 at 11:19 AM Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org wrote:

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Have a few small comments but generally okay with a patch like this one going out in 2.35.

bits/libm-simd-decl-stubs.h | 11 + math/bits/mathcalls.h | 2 +- .../unix/sysv/linux/x86_64/libmvec.abilist | 8 + sysdeps/x86/fpu/bits/math-vector.h | 4 + .../x86/fpu/finclude/math-vector-fortran.h | 4 + sysdeps/x86_64/fpu/Makeconfig | 1 + sysdeps/x86_64/fpu/Versions | 4 + sysdeps/x86_64/fpu/libm-test-ulps | 20 ++ .../fpu/multiarch/ifunc-mathvec-avx512-skx.h | 39 +++ .../fpu/multiarch/svml_d_acos2_core-sse2.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos2_core.c | 27 ++ .../fpu/multiarch/svml_d_acos2_core_sse4.S | 293 +++++++++++++++++ .../fpu/multiarch/svml_d_acos4_core-sse.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos4_core.c | 27 ++ .../fpu/multiarch/svml_d_acos4_core_avx2.S | 273 ++++++++++++++++ .../fpu/multiarch/svml_d_acos8_core-avx2.S | 20 ++ .../x86_64/fpu/multiarch/svml_d_acos8_core.c | 27 ++ .../fpu/multiarch/svml_d_acos8_core_avx512.S | 298 ++++++++++++++++++ .../fpu/multiarch/svml_s_acosf16_core-avx2.S | 20 ++ .../fpu/multiarch/svml_s_acosf16_core.c | 28 ++ .../multiarch/svml_s_acosf16_core_avx512.S | 262 +++++++++++++++ .../fpu/multiarch/svml_s_acosf4_core-sse2.S | 20 ++ .../x86_64/fpu/multiarch/svml_s_acosf4_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf4_core_sse4.S | 260 +++++++++++++++ .../fpu/multiarch/svml_s_acosf8_core-sse.S | 20 ++ .../x86_64/fpu/multiarch/svml_s_acosf8_core.c | 28 ++ .../fpu/multiarch/svml_s_acosf8_core_avx2.S | 252 +++++++++++++++ sysdeps/x86_64/fpu/svml_d_acos2_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S | 25 ++ sysdeps/x86_64/fpu/svml_d_acos8_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf16_core.S | 25 ++ sysdeps/x86_64/fpu/svml_s_acosf4_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core.S | 29 ++ sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S | 25 ++ .../x86_64/fpu/test-double-libmvec-acos-avx.c | 1 + .../fpu/test-double-libmvec-acos-avx2.c | 1 + .../fpu/test-double-libmvec-acos-avx512f.c | 1 + sysdeps/x86_64/fpu/test-double-libmvec-acos.c | 3 + .../x86_64/fpu/test-double-vlen2-wrappers.c | 1 + .../fpu/test-double-vlen4-avx2-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen4-wrappers.c | 1 + .../x86_64/fpu/test-double-vlen8-wrappers.c | 1 + .../x86_64/fpu/test-float-libmvec-acosf-avx.c | 1 + .../fpu/test-float-libmvec-acosf-avx2.c | 1 + .../fpu/test-float-libmvec-acosf-avx512f.c | 1 + sysdeps/x86_64/fpu/test-float-libmvec-acosf.c | 3 + .../x86_64/fpu/test-float-vlen16-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen4-wrappers.c | 1 + .../fpu/test-float-vlen8-avx2-wrappers.c | 1 + .../x86_64/fpu/test-float-vlen8-wrappers.c | 1 + 51 files changed, 2251 insertions(+), 1 deletion(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core_avx512.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core_sse4.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core_avx2.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos2_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos4_core_avx.S create mode 100644 sysdeps/x86_64/fpu/svml_d_acos8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf16_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf4_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core.S create mode 100644 sysdeps/x86_64/fpu/svml_s_acosf8_core_avx.S create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-double-libmvec-acos.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c create mode 100644 sysdeps/x86_64/fpu/test-float-libmvec-acosf.c

diff --git a/bits/libm-simd-decl-stubs.h b/bits/libm-simd-decl-stubs.h index b80ff332a0..2ccdd1fc53 100644 --- a/bits/libm-simd-decl-stubs.h +++ b/bits/libm-simd-decl-stubs.h @@ -98,4 +98,15 @@ #define __DECL_SIMD_powf32x #define __DECL_SIMD_powf64x #define __DECL_SIMD_powf128x + +#define __DECL_SIMD_acos +#define __DECL_SIMD_acosf +#define __DECL_SIMD_acosl +#define __DECL_SIMD_acosf16 +#define __DECL_SIMD_acosf32 +#define __DECL_SIMD_acosf64 +#define __DECL_SIMD_acosf128 +#define __DECL_SIMD_acosf32x +#define __DECL_SIMD_acosf64x +#define __DECL_SIMD_acosf128x #endif diff --git a/math/bits/mathcalls.h b/math/bits/mathcalls.h index da4cf4e10c..2cc6654208 100644 --- a/math/bits/mathcalls.h +++ b/math/bits/mathcalls.h @@ -50,7 +50,7 @@ /* Trigonometric functions. */

/* Arc cosine of X. / -__MATHCALL (acos,, (Mdouble __x)); +__MATHCALL_VEC (acos,, (Mdouble __x)); / Arc sine of X. / __MATHCALL (asin,, (Mdouble __x)); / Arc tangent of X. */ diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist index 363d4ace1e..b37b55777e 100644 --- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist @@ -46,3 +46,11 @@ GLIBC_2.22 _ZGVeN8v_log F GLIBC_2.22 _ZGVeN8v_sin F GLIBC_2.22 _ZGVeN8vv_pow F GLIBC_2.22 _ZGVeN8vvv_sincos F +GLIBC_2.35 _ZGVbN2v_acos F +GLIBC_2.35 _ZGVbN4v_acosf F +GLIBC_2.35 _ZGVcN4v_acos F +GLIBC_2.35 _ZGVcN8v_acosf F +GLIBC_2.35 _ZGVdN4v_acos F +GLIBC_2.35 _ZGVdN8v_acosf F +GLIBC_2.35 _ZGVeN16v_acosf F +GLIBC_2.35 _ZGVeN8v_acos F diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h index dc0bfb3705..dabb74cbb9 100644 --- a/sysdeps/x86/fpu/bits/math-vector.h +++ b/sysdeps/x86/fpu/bits/math-vector.h @@ -58,6 +58,10 @@

define __DECL_SIMD_pow __DECL_SIMD_x86_64

undef __DECL_SIMD_powf

define __DECL_SIMD_powf __DECL_SIMD_x86_64

+# undef __DECL_SIMD_acos +# define __DECL_SIMD_acos __DECL_SIMD_x86_64 +# undef __DECL_SIMD_acosf +# define __DECL_SIMD_acosf __DECL_SIMD_x86_64

endif

#endif diff --git a/sysdeps/x86/fpu/finclude/math-vector-fortran.h b/sysdeps/x86/fpu/finclude/math-vector-fortran.h index 311bb4e391..4bcbd1fbce 100644 --- a/sysdeps/x86/fpu/finclude/math-vector-fortran.h +++ b/sysdeps/x86/fpu/finclude/math-vector-fortran.h @@ -28,6 +28,8 @@ !GCC$ builtin (expf) attributes simd (notinbranch) if('x86_64') !GCC$ builtin (pow) attributes simd (notinbranch) if('x86_64') !GCC$ builtin (powf) attributes simd (notinbranch) if('x86_64') +!GCC$ builtin (acos) attributes simd (notinbranch) if('x86_64') +!GCC$ builtin (acosf) attributes simd (notinbranch) if('x86_64')

!GCC$ builtin (cos) attributes simd (notinbranch) if('x32') !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32') @@ -41,3 +43,5 @@ !GCC$ builtin (expf) attributes simd (notinbranch) if('x32') !GCC$ builtin (pow) attributes simd (notinbranch) if('x32') !GCC$ builtin (powf) attributes simd (notinbranch) if('x32') +!GCC$ builtin (acos) attributes simd (notinbranch) if('x32') +!GCC$ builtin (acosf) attributes simd (notinbranch) if('x32') diff --git a/sysdeps/x86_64/fpu/Makeconfig b/sysdeps/x86_64/fpu/Makeconfig index b0e3bf7887..7acf1f306c 100644 --- a/sysdeps/x86_64/fpu/Makeconfig +++ b/sysdeps/x86_64/fpu/Makeconfig @@ -22,6 +22,7 @@ postclean-generated += libmvec.mk

Define for both math and mathvec directories.

libmvec-funcs = \

+Function: "acos_vlen16": +float: 1 + +Function: "acos_vlen2": +double: 1 + +Function: "acos_vlen4": +double: 1 +float: 2 + +Function: "acos_vlen4_avx2": +double: 1 + +Function: "acos_vlen8": +double: 1 +float: 2 + +Function: "acos_vlen8_avx2": +float: 1 + Function: "acosh": double: 2 float: 2 diff --git a/sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h b/sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h new file mode 100644 index 0000000000..3aed563dde --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512-skx.h @@ -0,0 +1,39 @@ +/* Common definition for libmathvec ifunc selections optimized with

+#define PASTER2(x,y) x##_##y + +extern void REDIRECT_NAME (void); +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_wrapper) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (skx) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{

+#include "../svml_d_acos2_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c new file mode 100644 index 0000000000..5ba5d6fac2 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acos2_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized acos, vector length is 2.

+#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2v_acos, __GI__ZGVbN2v_acos, __redirect__ZGVbN2v_acos)

+ENTRY(_ZGVbN2v_acos_sse4)

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_dacos_data_internal; +#endif +__svml_dacos_data_internal:

diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S new file mode 100644 index 0000000000..750f71c81c --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized acos, vector length is 4.

+#include "../svml_d_acos4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c new file mode 100644 index 0000000000..6453e7ebe2 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acos4_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized acos, vector length is 4.

+#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4v_acos, __GI__ZGVdN4v_acos, __redirect__ZGVdN4v_acos)

+ENTRY(_ZGVdN4v_acos_avx2)

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_dacos_data_internal; +#endif +__svml_dacos_data_internal:

diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S new file mode 100644 index 0000000000..4d64fd1c00 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized acos, vector length is 8.

+#include "../svml_d_acos8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c new file mode 100644 index 0000000000..1e7d1865fb --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_d_acos8_core.c @@ -0,0 +1,27 @@ +/* Multiple versions of vectorized acos, vector length is 8.

+#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8v_acos, __GI__ZGVeN8v_acos, __redirect__ZGVeN8v_acos)

There is enough memory here it may pay to make the accesses sequential in memory.

+ENTRY(_ZGVeN8v_acos_skx)

drop I think

kandw %k4, %k2, %k3

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_dacos_data_internal; +#endif +__svml_dacos_data_internal:

diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S new file mode 100644 index 0000000000..1ff0cfc8d5 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core-avx2.S @@ -0,0 +1,20 @@ +/* AVX2 version of vectorized acosf.

+#include "../svml_s_acosf16_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c new file mode 100644 index 0000000000..fcf05782c5 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf16_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized acosf, vector length is 16.

+#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16v_acosf, __GI__ZGVeN16v_acosf,

+ENTRY(_ZGVeN16v_acosf_skx)

drop I think

kandw %k4, %k2, %k3

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_sacos_data_internal; +#endif +__svml_sacos_data_internal:

diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S new file mode 100644 index 0000000000..f94b3eb01a --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core-sse2.S @@ -0,0 +1,20 @@ +/* SSE2 version of vectorized acosf, vector length is 4.

+#include "../svml_s_acosf4_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c new file mode 100644 index 0000000000..6f9a5c1082 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf4_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized acosf, vector length is 4.

+#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4v_acosf, __GI__ZGVbN4v_acosf,

+ENTRY(_ZGVbN4v_acosf_sse4)

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_sacos_data_internal; +#endif +__svml_sacos_data_internal:

diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S new file mode 100644 index 0000000000..583ef54fee --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core-sse.S @@ -0,0 +1,20 @@ +/* SSE version of vectorized acosf, vector length is 8.

+#include "../svml_s_acosf8_core.S" diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c new file mode 100644 index 0000000000..dd360a9479 --- /dev/null +++ b/sysdeps/x86_64/fpu/multiarch/svml_s_acosf8_core.c @@ -0,0 +1,28 @@ +/* Multiple versions of vectorized acosf, vector length is 8.

+#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8v_acosf, __GI__ZGVdN8v_acosf,

+ENTRY(_ZGVdN8v_acosf_avx2)

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_sacos_data_internal; +#endif +__svml_sacos_data_internal:

diff --git a/sysdeps/x86_64/fpu/svml_d_acos2_core.S b/sysdeps/x86_64/fpu/svml_d_acos2_core.S new file mode 100644 index 0000000000..9656478b2d --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_acos2_core.S @@ -0,0 +1,29 @@ +/* Function acos vectorized with SSE2.

+#include "svml_d_wrapper_impl.h" +

+ENTRY (_ZGVbN2v_acos) +WRAPPER_IMPL_SSE2 acos +END (_ZGVbN2v_acos) + +#ifndef USE_MULTIARCH

+#include "svml_d_wrapper_impl.h" +

+ENTRY (_ZGVdN4v_acos) +WRAPPER_IMPL_AVX _ZGVbN2v_acos +END (_ZGVdN4v_acos) + +#ifndef USE_MULTIARCH

+#include "svml_d_wrapper_impl.h" +

+ENTRY (_ZGVcN4v_acos) +WRAPPER_IMPL_AVX _ZGVbN2v_acos +END (_ZGVcN4v_acos) diff --git a/sysdeps/x86_64/fpu/svml_d_acos8_core.S b/sysdeps/x86_64/fpu/svml_d_acos8_core.S new file mode 100644 index 0000000000..e26b30d81a --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_d_acos8_core.S @@ -0,0 +1,25 @@ +/* Function acos vectorized with AVX-512, wrapper to AVX2.

+#include "svml_d_wrapper_impl.h" +

+ENTRY (_ZGVeN8v_acos) +WRAPPER_IMPL_AVX512 _ZGVdN4v_acos +END (_ZGVeN8v_acos) diff --git a/sysdeps/x86_64/fpu/svml_s_acosf16_core.S b/sysdeps/x86_64/fpu/svml_s_acosf16_core.S new file mode 100644 index 0000000000..70e046d492 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_acosf16_core.S @@ -0,0 +1,25 @@ +/* Function acosf vectorized with AVX-512. Wrapper to AVX2 version.

+#include "svml_s_wrapper_impl.h" +

+ENTRY (_ZGVeN16v_acosf) +WRAPPER_IMPL_AVX512 _ZGVdN8v_acosf +END (_ZGVeN16v_acosf) diff --git a/sysdeps/x86_64/fpu/svml_s_acosf4_core.S b/sysdeps/x86_64/fpu/svml_s_acosf4_core.S new file mode 100644 index 0000000000..36354b32b5 --- /dev/null +++ b/sysdeps/x86_64/fpu/svml_s_acosf4_core.S @@ -0,0 +1,29 @@ +/* Function acosf vectorized with SSE2, wrapper version.

+#include "svml_s_wrapper_impl.h" +

+ENTRY (_ZGVbN4v_acosf) +WRAPPER_IMPL_SSE2 acosf +END (_ZGVbN4v_acosf) + +#ifndef USE_MULTIARCH

+#include "svml_s_wrapper_impl.h" +

+ENTRY (_ZGVdN8v_acosf) +WRAPPER_IMPL_AVX _ZGVbN4v_acosf +END (_ZGVdN8v_acosf) + +#ifndef USE_MULTIARCH

+#include "svml_s_wrapper_impl.h" +

+ENTRY (_ZGVcN8v_acosf) +WRAPPER_IMPL_AVX _ZGVbN4v_acosf +END (_ZGVcN8v_acosf) diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c b/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c new file mode 100644 index 0000000000..4f74b4260a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx.c @@ -0,0 +1 @@ +#include "test-double-libmvec-acos.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c b/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c new file mode 100644 index 0000000000..4f74b4260a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx2.c @@ -0,0 +1 @@ +#include "test-double-libmvec-acos.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c b/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c new file mode 100644 index 0000000000..4f74b4260a --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acos-avx512f.c @@ -0,0 +1 @@ +#include "test-double-libmvec-acos.c" diff --git a/sysdeps/x86_64/fpu/test-double-libmvec-acos.c b/sysdeps/x86_64/fpu/test-double-libmvec-acos.c new file mode 100644 index 0000000000..e38b8ce821 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-double-libmvec-acos.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC acos +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c index ed932fc98d..0abc7d2021 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c @@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)

#define VEC_INT_TYPE __m128i

diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c index 3a6e37044f..dda093b914 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c @@ -30,6 +30,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVdN4v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVdN4v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVdN4v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)

#ifndef ILP32

define VEC_INT_TYPE __m256i

diff --git a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c index 99db4e7616..f3230463bb 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c @@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVcN4v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVcN4v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVcN4v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)

#define VEC_INT_TYPE __m128i

diff --git a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c index 251d429ac0..cf9f52faf0 100644 --- a/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c @@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVeN8v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVeN8v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVeN8v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)

#ifndef ILP32

define VEC_INT_TYPE __m512i

diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c b/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c new file mode 100644 index 0000000000..1e6474dfa2 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx.c @@ -0,0 +1 @@ +#include "test-float-libmvec-acosf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c b/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c new file mode 100644 index 0000000000..1e6474dfa2 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx2.c @@ -0,0 +1 @@ +#include "test-float-libmvec-acosf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c b/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c new file mode 100644 index 0000000000..1e6474dfa2 --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acosf-avx512f.c @@ -0,0 +1 @@ +#include "test-float-libmvec-acosf.c" diff --git a/sysdeps/x86_64/fpu/test-float-libmvec-acosf.c b/sysdeps/x86_64/fpu/test-float-libmvec-acosf.c new file mode 100644 index 0000000000..fb47f974fd --- /dev/null +++ b/sysdeps/x86_64/fpu/test-float-libmvec-acosf.c @@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC acosf +#include "test-vector-abi-arg1.h" diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c index c1d14cd79e..abbd3ed870 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c @@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVeN16v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)

#define VEC_INT_TYPE __m512i

diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c index d23c372060..8a24027952 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c @@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)

#define VEC_INT_TYPE __m128i

diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c index 3152cffb0c..aff0442606 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c @@ -30,6 +30,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVdN8v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)

/* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c index a8492abfef..913584d111 100644 --- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c +++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c @@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVcN8v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)

#define VEC_INT_TYPE __m128i

-- 2.31.1

On Sun, Dec 19, 2021 at 12:29:07PM -0600, GNU C Library wrote:

On Sun, Dec 19, 2021 at 11:19 AM Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org wrote:

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Have a few small comments but generally okay with a patch like this one going out in 2.35.

...

+#define poly_coeff_6 896 +#define poly_coeff_7 960 +#define poly_coeff_8 1024 +#define poly_coeff_9 1088 +#define poly_coeff_10 1152 +#define poly_coeff_11 1216 +#define poly_coeff_12 1280 +#define PiH 1344 +#define Pi2H 1408

There is enough memory here it may pay to make the accesses

Did you enough registers?

sequential in memory.

This is based on Intel compiler generated codes. We will evaluate Intel compiler changes.

...

  • +#include <sysdep.h>
  •    vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
  •    andl      %eax, %ecx

drop I think

  •    vmovups   poly_coeff_12+__svml_dacos_data_internal(%rip), %zmm11
  •    kmovw     %ecx, %k3

kandw %k4, %k2, %k3

This may not be faster since mask register can only go to port 0. We will evaluate register allocation in Intel compiler.

Thanks.

H.J.

On Sun, Dec 19, 2021 at 2:26 PM H.J. Lu hjl.tools@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:29:07PM -0600, GNU C Library wrote:

On Sun, Dec 19, 2021 at 11:19 AM Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org wrote:

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Have a few small comments but generally okay with a patch like this one going out in 2.35.

...

+#define poly_coeff_6 896 +#define poly_coeff_7 960 +#define poly_coeff_8 1024 +#define poly_coeff_9 1088 +#define poly_coeff_10 1152 +#define poly_coeff_11 1216 +#define poly_coeff_12 1280 +#define PiH 1344 +#define Pi2H 1408

There is enough memory here it may pay to make the accesses

Did you enough registers?

This shouldn't affect register allocation. It's just if in the program we access: poly_coeff_11 -> poly_coeff_6 -> poly_coeff_8

it might be beneficial to organize the addresses of 11/6/8 s.t its sequential memory accesses from the table i.e #define poly_coeff_11 896 #define poly_coeff_6 960 #define poly_coeff_8 1024 ...

Random example and just a thought. Figure if coming in cold it might save a cache miss or two because it has an easy to recognize pattern for the HW prefetcher. Don't think it's make or break.

sequential in memory.

This is based on Intel compiler generated codes. We will evaluate Intel compiler changes.

...

  • +#include <sysdep.h>
  •    vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
  •    andl      %eax, %ecx

drop I think

  •    vmovups   poly_coeff_12+__svml_dacos_data_internal(%rip), %zmm11
  •    kmovw     %ecx, %k3

kandw %k4, %k2, %k3

This may not be faster since mask register can only go to port 0. We will evaluate register allocation in Intel compiler.

kmovw and kandw are both 1uop port0.

andl + kmovw is 2 uops and has 4c latency vs kandw is 1 uop and 1c latency.

Thanks.

H.J.

On Sun, Dec 19, 2021 at 12:42 PM Noah Goldstein goldstein.w.n@gmail.com wrote:

On Sun, Dec 19, 2021 at 2:26 PM H.J. Lu hjl.tools@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:29:07PM -0600, GNU C Library wrote:

On Sun, Dec 19, 2021 at 11:19 AM Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org wrote:

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Have a few small comments but generally okay with a patch like this one going out in 2.35.

...

+#define poly_coeff_6 896 +#define poly_coeff_7 960 +#define poly_coeff_8 1024 +#define poly_coeff_9 1088 +#define poly_coeff_10 1152 +#define poly_coeff_11 1216 +#define poly_coeff_12 1280 +#define PiH 1344 +#define Pi2H 1408

There is enough memory here it may pay to make the accesses

Did you enough registers?

This shouldn't affect register allocation. It's just if in the program we access: poly_coeff_11 -> poly_coeff_6 -> poly_coeff_8

it might be beneficial to organize the addresses of 11/6/8 s.t its sequential memory accesses from the table i.e #define poly_coeff_11 896 #define poly_coeff_6 960 #define poly_coeff_8 1024 ...

Random example and just a thought. Figure if coming in cold it might save a cache miss or two because it has an easy to recognize pattern for the HW prefetcher. Don't think it's make or break.

Good suggestion. It's difficult to hand modify. Will let compiler team know about this optimization.

sequential in memory.

This is based on Intel compiler generated codes. We will evaluate Intel compiler changes.

...

  • +#include <sysdep.h>
  •    vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
  •    andl      %eax, %ecx

drop I think

  •    vmovups   poly_coeff_12+__svml_dacos_data_internal(%rip), %zmm11
  •    kmovw     %ecx, %k3

kandw %k4, %k2, %k3

This may not be faster since mask register can only go to port 0. We will evaluate register allocation in Intel compiler.

kmovw and kandw are both 1uop port0.

andl + kmovw is 2 uops and has 4c latency vs kandw is 1 uop and 1c latency.

Will be fixed in v6.

Thanks.

H.J.

On Mon, Dec 20, 2021 at 10:08 AM Sunil Pandey skpgkp2@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:42 PM Noah Goldstein goldstein.w.n@gmail.com wrote:

On Sun, Dec 19, 2021 at 2:26 PM H.J. Lu hjl.tools@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:29:07PM -0600, GNU C Library wrote:

On Sun, Dec 19, 2021 at 11:19 AM Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org wrote:

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Have a few small comments but generally okay with a patch like this one going out in 2.35.

...

+#define poly_coeff_6 896 +#define poly_coeff_7 960 +#define poly_coeff_8 1024 +#define poly_coeff_9 1088 +#define poly_coeff_10 1152 +#define poly_coeff_11 1216 +#define poly_coeff_12 1280 +#define PiH 1344 +#define Pi2H 1408

There is enough memory here it may pay to make the accesses

Did you enough registers?

This shouldn't affect register allocation. It's just if in the program we access: poly_coeff_11 -> poly_coeff_6 -> poly_coeff_8

it might be beneficial to organize the addresses of 11/6/8 s.t its sequential memory accesses from the table i.e #define poly_coeff_11 896 #define poly_coeff_6 960 #define poly_coeff_8 1024 ...

Random example and just a thought. Figure if coming in cold it might save a cache miss or two because it has an easy to recognize pattern for the HW prefetcher. Don't think it's make or break.

Good suggestion. It's difficult to hand modify. Will let compiler team know about this optimization.

Like I said, can live with/without this optimization in the first version (mostly because I think its unclear what the actual best schema is), but this patch is being submitted as asm and meant to be maintained as asm. If the only feasible way to make future changes/optimizations is to update the compiler and recompile some higher level language, that's an issue.

sequential in memory.

This is based on Intel compiler generated codes. We will evaluate Intel compiler changes.

...

  • +#include <sysdep.h>
  •    vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
  •    andl      %eax, %ecx

drop I think

  •    vmovups   poly_coeff_12+__svml_dacos_data_internal(%rip), %zmm11
  •    kmovw     %ecx, %k3

kandw %k4, %k2, %k3

This may not be faster since mask register can only go to port 0. We will evaluate register allocation in Intel compiler.

kmovw and kandw are both 1uop port0.

andl + kmovw is 2 uops and has 4c latency vs kandw is 1 uop and 1c latency.

Will be fixed in v6.

Thanks.

H.J.

On Mon, Dec 20, 2021 at 1:20 PM Noah Goldstein goldstein.w.n@gmail.com wrote:

On Mon, Dec 20, 2021 at 10:08 AM Sunil Pandey skpgkp2@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:42 PM Noah Goldstein goldstein.w.n@gmail.com wrote:

On Sun, Dec 19, 2021 at 2:26 PM H.J. Lu hjl.tools@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:29:07PM -0600, GNU C Library wrote:

On Sun, Dec 19, 2021 at 11:19 AM Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org wrote:

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Have a few small comments but generally okay with a patch like this one going out in 2.35.

...

+#define poly_coeff_6 896 +#define poly_coeff_7 960 +#define poly_coeff_8 1024 +#define poly_coeff_9 1088 +#define poly_coeff_10 1152 +#define poly_coeff_11 1216 +#define poly_coeff_12 1280 +#define PiH 1344 +#define Pi2H 1408

There is enough memory here it may pay to make the accesses

Did you enough registers?

This shouldn't affect register allocation. It's just if in the program we access: poly_coeff_11 -> poly_coeff_6 -> poly_coeff_8

it might be beneficial to organize the addresses of 11/6/8 s.t its sequential memory accesses from the table i.e #define poly_coeff_11 896 #define poly_coeff_6 960 #define poly_coeff_8 1024 ...

Random example and just a thought. Figure if coming in cold it might save a cache miss or two because it has an easy to recognize pattern for the HW prefetcher. Don't think it's make or break.

Good suggestion. It's difficult to hand modify. Will let compiler team know about this optimization.

Like I said, can live with/without this optimization in the first version (mostly because I think its unclear what the actual best schema is), but this patch is being submitted as asm and meant to be maintained as asm. If the only feasible way to make future changes/optimizations is to update the compiler and recompile some higher level language, that's an issue.

sequential in memory.

This is based on Intel compiler generated codes. We will evaluate Intel compiler changes.

...

  • +#include <sysdep.h>
  •    vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
  •    andl      %eax, %ecx

drop I think

  •    vmovups   poly_coeff_12+__svml_dacos_data_internal(%rip), %zmm11
  •    kmovw     %ecx, %k3

kandw %k4, %k2, %k3

This may not be faster since mask register can only go to port 0. We will evaluate register allocation in Intel compiler.

kmovw and kandw are both 1uop port0.

andl + kmovw is 2 uops and has 4c latency vs kandw is 1 uop and 1c latency.

Will be fixed in v6.

In the other patches (for other functions, this one is fine) can you have the compiler printout (maybe just a comment at the end of the line) the the live-intervals for each register assignment. Looking at this code there is a perception of extreme register pressure but a lot of that seems forced by suspect instruction scheduling. It would be easier to notice that for future maintenance with the the comment.

Thanks.

H.J.

On Mon, Dec 20, 2021 at 11:20 AM Noah Goldstein goldstein.w.n@gmail.com wrote:

On Mon, Dec 20, 2021 at 10:08 AM Sunil Pandey skpgkp2@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:42 PM Noah Goldstein goldstein.w.n@gmail.com wrote:

On Sun, Dec 19, 2021 at 2:26 PM H.J. Lu hjl.tools@gmail.com wrote:

On Sun, Dec 19, 2021 at 12:29:07PM -0600, GNU C Library wrote:

On Sun, Dec 19, 2021 at 11:19 AM Sunil K Pandey via Libc-alpha libc-alpha@sourceware.org wrote:

Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Have a few small comments but generally okay with a patch like this one going out in 2.35.

...

+#define poly_coeff_6 896 +#define poly_coeff_7 960 +#define poly_coeff_8 1024 +#define poly_coeff_9 1088 +#define poly_coeff_10 1152 +#define poly_coeff_11 1216 +#define poly_coeff_12 1280 +#define PiH 1344 +#define Pi2H 1408

There is enough memory here it may pay to make the accesses

Did you enough registers?

This shouldn't affect register allocation. It's just if in the program we access: poly_coeff_11 -> poly_coeff_6 -> poly_coeff_8

it might be beneficial to organize the addresses of 11/6/8 s.t its sequential memory accesses from the table i.e #define poly_coeff_11 896 #define poly_coeff_6 960 #define poly_coeff_8 1024 ...

Random example and just a thought. Figure if coming in cold it might save a cache miss or two because it has an easy to recognize pattern for the HW prefetcher. Don't think it's make or break.

Good suggestion. It's difficult to hand modify. Will let compiler team know about this optimization.

Like I said, can live with/without this optimization in the first version (mostly because I think its unclear what the actual best schema is), but this patch is being submitted as asm and meant to be maintained as asm. If the only feasible way to make future changes/optimizations is to update the compiler and recompile some higher level language, that's an issue.

We prefer to generate compiler optimized code. We can certainly hand optimize just like we did for other cases. For this version we want to leave as is.

sequential in memory.

This is based on Intel compiler generated codes. We will evaluate Intel compiler changes.

...

  • +#include <sysdep.h>
  •    vfmadd231pd {rn-sae}, %zmm3, %zmm11, %zmm10
  •    andl      %eax, %ecx

drop I think

  •    vmovups   poly_coeff_12+__svml_dacos_data_internal(%rip), %zmm11
  •    kmovw     %ecx, %k3

kandw %k4, %k2, %k3

This may not be faster since mask register can only go to port 0. We will evaluate register allocation in Intel compiler.

kmovw and kandw are both 1uop port0.

andl + kmovw is 2 uops and has 4c latency vs kandw is 1 uop and 1c latency.

Will be fixed in v6.

Thanks.

H.J.

Patch

@@ -98,4 +98,15 @@ #define __DECL_SIMD_powf32x #define __DECL_SIMD_powf64x #define __DECL_SIMD_powf128x + +#define __DECL_SIMD_acos +#define __DECL_SIMD_acosf +#define __DECL_SIMD_acosl +#define __DECL_SIMD_acosf16 +#define __DECL_SIMD_acosf32 +#define __DECL_SIMD_acosf64 +#define __DECL_SIMD_acosf128 +#define __DECL_SIMD_acosf32x +#define __DECL_SIMD_acosf64x +#define __DECL_SIMD_acosf128x #endif

@@ -50,7 +50,7 @@ /* Trigonometric functions. */

/* Arc cosine of X. / -__MATHCALL (acos,, (Mdouble __x)); +__MATHCALL_VEC (acos,, (Mdouble __x)); / Arc sine of X. / __MATHCALL (asin,, (Mdouble __x)); / Arc tangent of X. */

@@ -46,3 +46,11 @@ GLIBC_2.22 _ZGVeN8v_log F GLIBC_2.22 _ZGVeN8v_sin F GLIBC_2.22 _ZGVeN8vv_pow F GLIBC_2.22 _ZGVeN8vvv_sincos F +GLIBC_2.35 _ZGVbN2v_acos F +GLIBC_2.35 _ZGVbN4v_acosf F +GLIBC_2.35 _ZGVcN4v_acos F +GLIBC_2.35 _ZGVcN8v_acosf F +GLIBC_2.35 _ZGVdN4v_acos F +GLIBC_2.35 _ZGVdN8v_acosf F +GLIBC_2.35 _ZGVeN16v_acosf F +GLIBC_2.35 _ZGVeN8v_acos F

@@ -58,6 +58,10 @@

define __DECL_SIMD_pow __DECL_SIMD_x86_64

undef __DECL_SIMD_powf

define __DECL_SIMD_powf __DECL_SIMD_x86_64

+# undef __DECL_SIMD_acos +# define __DECL_SIMD_acos __DECL_SIMD_x86_64 +# undef __DECL_SIMD_acosf +# define __DECL_SIMD_acosf __DECL_SIMD_x86_64

endif

#endif

@@ -28,6 +28,8 @@ !GCC$ builtin (expf) attributes simd (notinbranch) if('x86_64') !GCC$ builtin (pow) attributes simd (notinbranch) if('x86_64') !GCC$ builtin (powf) attributes simd (notinbranch) if('x86_64') +!GCC$ builtin (acos) attributes simd (notinbranch) if('x86_64') +!GCC$ builtin (acosf) attributes simd (notinbranch) if('x86_64')

!GCC$ builtin (cos) attributes simd (notinbranch) if('x32') !GCC$ builtin (cosf) attributes simd (notinbranch) if('x32') @@ -41,3 +43,5 @@ !GCC$ builtin (expf) attributes simd (notinbranch) if('x32') !GCC$ builtin (pow) attributes simd (notinbranch) if('x32') !GCC$ builtin (powf) attributes simd (notinbranch) if('x32') +!GCC$ builtin (acos) attributes simd (notinbranch) if('x32') +!GCC$ builtin (acosf) attributes simd (notinbranch) if('x32')

@@ -22,6 +22,7 @@ postclean-generated += libmvec.mk

Define for both math and mathvec directories.

libmvec-funcs = \

@@ -13,4 +13,8 @@ libmvec { _ZGVbN4vv_powf; _ZGVcN8vv_powf; _ZGVdN8vv_powf; _ZGVeN16vv_powf; _ZGVbN4vvv_sincosf; _ZGVcN8vvv_sincosf; _ZGVdN8vvv_sincosf; _ZGVeN16vvv_sincosf; }

@@ -25,6 +25,26 @@ float: 1 float128: 1 ldouble: 2

+Function: "acos_vlen16": +float: 1 + +Function: "acos_vlen2": +double: 1 + +Function: "acos_vlen4": +double: 1 +float: 2 + +Function: "acos_vlen4_avx2": +double: 1 + +Function: "acos_vlen8": +double: 1 +float: 2 + +Function: "acos_vlen8_avx2": +float: 1 + Function: "acosh": double: 2 float: 2

new file mode 100644

@@ -0,0 +1,39 @@ +/* Common definition for libmathvec ifunc selections optimized with

+#define PASTER2(x,y) x##_##y + +extern void REDIRECT_NAME (void); +extern __typeof (REDIRECT_NAME) OPTIMIZE (avx2_wrapper) attribute_hidden; +extern __typeof (REDIRECT_NAME) OPTIMIZE (skx) attribute_hidden; + +static inline void * +IFUNC_SELECTOR (void) +{

new file mode 100644

@@ -0,0 +1,20 @@ +/* SSE2 version of vectorized acos, vector length is 2.

+#include "../svml_d_acos2_core.S"

new file mode 100644

@@ -0,0 +1,27 @@ +/* Multiple versions of vectorized acos, vector length is 2.

+#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN2v_acos, __GI__ZGVbN2v_acos, __redirect__ZGVbN2v_acos)

new file mode 100644

@@ -0,0 +1,293 @@ +/* Function acos vectorized with SSE4.

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_dacos_data_internal; +#endif +__svml_dacos_data_internal:

new file mode 100644

@@ -0,0 +1,20 @@ +/* SSE version of vectorized acos, vector length is 4.

+#include "../svml_d_acos4_core.S"

new file mode 100644

@@ -0,0 +1,27 @@ +/* Multiple versions of vectorized acos, vector length is 4.

+#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN4v_acos, __GI__ZGVdN4v_acos, __redirect__ZGVdN4v_acos)

new file mode 100644

@@ -0,0 +1,273 @@ +/* Function acos vectorized with AVX2.

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_dacos_data_internal; +#endif +__svml_dacos_data_internal:

new file mode 100644

@@ -0,0 +1,20 @@ +/* AVX2 version of vectorized acos, vector length is 8.

+#include "../svml_d_acos8_core.S"

new file mode 100644

@@ -0,0 +1,27 @@ +/* Multiple versions of vectorized acos, vector length is 8.

+#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN8v_acos, __GI__ZGVeN8v_acos, __redirect__ZGVeN8v_acos)

new file mode 100644

@@ -0,0 +1,298 @@ +/* Function acos vectorized with AVX-512.

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_dacos_data_internal; +#endif +__svml_dacos_data_internal:

new file mode 100644

@@ -0,0 +1,20 @@ +/* AVX2 version of vectorized acosf.

+#include "../svml_s_acosf16_core.S"

new file mode 100644

@@ -0,0 +1,28 @@ +/* Multiple versions of vectorized acosf, vector length is 16.

+#include "ifunc-mathvec-avx512-skx.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVeN16v_acosf, __GI__ZGVeN16v_acosf, + __redirect__ZGVeN16v_acosf)

new file mode 100644

@@ -0,0 +1,262 @@ +/* Function acosf vectorized with AVX-512.

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_sacos_data_internal; +#endif +__svml_sacos_data_internal:

new file mode 100644

@@ -0,0 +1,20 @@ +/* SSE2 version of vectorized acosf, vector length is 4.

+#include "../svml_s_acosf4_core.S"

new file mode 100644

@@ -0,0 +1,28 @@ +/* Multiple versions of vectorized acosf, vector length is 4.

+#include "ifunc-mathvec-sse4_1.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVbN4v_acosf, __GI__ZGVbN4v_acosf, + __redirect__ZGVbN4v_acosf)

new file mode 100644

@@ -0,0 +1,260 @@ +/* Function acosf vectorized with SSE4.

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_sacos_data_internal; +#endif +__svml_sacos_data_internal:

new file mode 100644

@@ -0,0 +1,20 @@ +/* SSE version of vectorized acosf, vector length is 8.

+#include "../svml_s_acosf8_core.S"

new file mode 100644

@@ -0,0 +1,28 @@ +/* Multiple versions of vectorized acosf, vector length is 8.

+#include "ifunc-mathvec-avx2.h" + +libc_ifunc_redirected (REDIRECT_NAME, SYMBOL_NAME, IFUNC_SELECTOR ()); + +#ifdef SHARED +__hidden_ver1 (_ZGVdN8v_acosf, __GI__ZGVdN8v_acosf, + __redirect__ZGVdN8v_acosf)

new file mode 100644

@@ -0,0 +1,252 @@ +/* Function acosf vectorized with AVX2.

+typedef unsigned int VUINT32; +typedef struct {

+} __svml_sacos_data_internal; +#endif +__svml_sacos_data_internal:

new file mode 100644

@@ -0,0 +1,29 @@ +/* Function acos vectorized with SSE2.

+#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVbN2v_acos) +WRAPPER_IMPL_SSE2 acos +END (_ZGVbN2v_acos) + +#ifndef USE_MULTIARCH

new file mode 100644

@@ -0,0 +1,29 @@ +/* Function acos vectorized with AVX2, wrapper version.

+#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVdN4v_acos) +WRAPPER_IMPL_AVX _ZGVbN2v_acos +END (_ZGVdN4v_acos) + +#ifndef USE_MULTIARCH

new file mode 100644

@@ -0,0 +1,25 @@ +/* Function acos vectorized in AVX ISA as wrapper to SSE4 ISA version.

+#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVcN4v_acos) +WRAPPER_IMPL_AVX _ZGVbN2v_acos +END (_ZGVcN4v_acos)

new file mode 100644

@@ -0,0 +1,25 @@ +/* Function acos vectorized with AVX-512, wrapper to AVX2.

+#include "svml_d_wrapper_impl.h" + + .text +ENTRY (_ZGVeN8v_acos) +WRAPPER_IMPL_AVX512 _ZGVdN4v_acos +END (_ZGVeN8v_acos)

new file mode 100644

@@ -0,0 +1,25 @@ +/* Function acosf vectorized with AVX-512. Wrapper to AVX2 version.

+#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVeN16v_acosf) +WRAPPER_IMPL_AVX512 _ZGVdN8v_acosf +END (_ZGVeN16v_acosf)

new file mode 100644

@@ -0,0 +1,29 @@ +/* Function acosf vectorized with SSE2, wrapper version.

+#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVbN4v_acosf) +WRAPPER_IMPL_SSE2 acosf +END (_ZGVbN4v_acosf) + +#ifndef USE_MULTIARCH

new file mode 100644

@@ -0,0 +1,29 @@ +/* Function acosf vectorized with AVX2, wrapper version.

+#include "svml_s_wrapper_impl.h" + + .text +ENTRY (_ZGVdN8v_acosf) +WRAPPER_IMPL_AVX _ZGVbN4v_acosf +END (_ZGVdN8v_acosf) + +#ifndef USE_MULTIARCH

new file mode 100644

@@ -0,0 +1,25 @@ +/* Function acosf vectorized in AVX ISA as wrapper to SSE4 ISA version.

+#include "svml_s_wrapper_impl.h" +

+ENTRY (_ZGVcN8v_acosf) +WRAPPER_IMPL_AVX _ZGVbN4v_acosf +END (_ZGVcN8v_acosf)

new file mode 100644

@@ -0,0 +1 @@ +#include "test-double-libmvec-acos.c"

new file mode 100644

@@ -0,0 +1 @@ +#include "test-double-libmvec-acos.c"

new file mode 100644

@@ -0,0 +1 @@ +#include "test-double-libmvec-acos.c"

new file mode 100644

@@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE double +#define LIBMVEC_FUNC acos +#include "test-vector-abi-arg1.h"

@@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVbN2v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVbN2v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVbN2v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVbN2vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVbN2v_acos)

#define VEC_INT_TYPE __m128i

@@ -30,6 +30,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVdN4v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVdN4v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVdN4v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVdN4vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVdN4v_acos)

#ifndef ILP32

define VEC_INT_TYPE __m256i

@@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVcN4v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVcN4v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVcN4v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVcN4vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVcN4v_acos)

#define VEC_INT_TYPE __m128i

@@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sin), _ZGVeN8v_sin) VECTOR_WRAPPER (WRAPPER_NAME (log), _ZGVeN8v_log) VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVeN8v_exp) VECTOR_WRAPPER_ff (WRAPPER_NAME (pow), _ZGVeN8vv_pow) +VECTOR_WRAPPER (WRAPPER_NAME (acos), _ZGVeN8v_acos)

#ifndef ILP32

define VEC_INT_TYPE __m512i

new file mode 100644

@@ -0,0 +1 @@ +#include "test-float-libmvec-acosf.c"

new file mode 100644

@@ -0,0 +1 @@ +#include "test-float-libmvec-acosf.c"

new file mode 100644

@@ -0,0 +1 @@ +#include "test-float-libmvec-acosf.c"

new file mode 100644

@@ -0,0 +1,3 @@ +#define LIBMVEC_TYPE float +#define LIBMVEC_FUNC acosf +#include "test-vector-abi-arg1.h"

@@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVeN16v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVeN16v_acosf)

#define VEC_INT_TYPE __m512i

@@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVbN4v_acosf)

#define VEC_INT_TYPE __m128i

@@ -30,6 +30,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVdN8v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVdN8v_acosf)

/* Redefinition of wrapper to be compatible with _ZGVdN8vvv_sincosf. */ #undef VECTOR_WRAPPER_fFF

@@ -27,6 +27,7 @@ VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVcN8v_sinf) VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf) VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf) VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf) +VECTOR_WRAPPER (WRAPPER_NAME (acosf), _ZGVcN8v_acosf)

#define VEC_INT_TYPE __m128i