Floating-point arithmetic | Acta Numerica | Cambridge Core (original) (raw)

References

Agrawal, A., Mueller, S. M., Fleischer, B. M., Sun, X., Wang, N., Choi, J. and Gopalakrishnan, K. (2019), DLFloat: A 16-b floating point format designed for deep learning training and inference, in 26th IEEE Symposium on Computer Arithmetic, IEEE, pp. 92–95.Google Scholar

Anderson, C. S., Zhang, J. and Cornea, M. (2018), Enhanced vector math support on the Intel®AVX-512 architecture, in 25th IEEE Symposium on Computer Arithmetic, pp. 120–124.Google Scholar

Babuška, I. (1969), Numerical stability in mathematical analysis, in Proceedings of the 1968 IFIP Congress , Vol. 1, pp. 11–23.Google Scholar

Barnes, R. C. M., Cooke-Yarborough, E. H. and Thomas, D. G. A. (1951), An electronic digital computor using cold cathode counting tubes for storage (Part 1), Electron. Engng 23, 286–291.Google Scholar

Bertaccini, L., Paulin, G., Fischer, T., Mach, S. and Benini, L. (2022), MiniFloat-NN and ExSdotp: An ISA extension and a modular open hardware unit for low-precision training on RISC-V cores, in 29th IEEE Symposium on Computer Arithmetic.CrossRefGoogle Scholar

Blanchard, P., Higham, N. J. and Mary, T. (2020), A class of fast and accurate summation algorithms, SIAM J. Sci. Comput. 42, A1541–A1557.CrossRefGoogle Scholar

Bohlender, G., Walter, W., Kornerup, P. and Matula, D. (1991), Semantics for exact floating point operations, in 10th IEEE Symposium on Computer Arithmetic, pp. 22–26.Google Scholar

Boldo, S. (2006), Pitfalls of a full floating-point proof: Example on the formal proof of the Veltkamp/Dekker algorithms, in 3rd International Joint Conference on Automated Reasoning (Furbach, U. and Shankar, N., eds), Vol. 4130 of Lecture Notes in Computer Science, Springer, pp. 52–66.Google Scholar

Boldo, S. (2009), Kahan’s algorithm for a correct discriminant computation at last formally proven, IEEE Trans. Comput. 58, 220–225.CrossRefGoogle Scholar

Boldo, S. and Daumas, M. (2003), Representable correcting terms for possibly underflowing floating point operations, in 16th IEEE Symposium on Computer Arithmetic (Bajard, J.-C. and Schulte, M., eds), pp. 79–86.Google Scholar

Boldo, S. and Melquiond, G. (2008), Emulation of a FMA and correctly rounded sums: Proved algorithms using rounding to odd, IEEE Trans. Comput. 57, 462–471.CrossRefGoogle Scholar

Boldo, S. and Melquiond, G. (2017), Computer Arithmetic and Formal Proofs , ISTE Press / Elsevier.Google Scholar

Boldo, S. and Muller, J.-M. (2005), Some functions computable with a fused-mac, in 17th IEEE Symposium on Computer Arithmetic, pp. 52–58.Google Scholar

Boldo, S. and Muller, J.-M. (2011), Exact and approximated error of the FMA, IEEE Trans. Comput. 60, 157–164.CrossRefGoogle Scholar

Boldo, S., Graillat, S. and Muller, J.-M. (2017), On the robustness of the 2Sum and Fast2Sum algorithms, ACM Trans. Math. Softw . 44, 4:1–4:14.CrossRefGoogle Scholar

Boldo, S., Lauter, C. and Muller, J.-M. (2021), Emulating round-to-nearest ties-to-zero ‘augmented’ floating-point operations using round-to-nearest ties-to-even arithmetic, IEEE Trans. Comput. 70, 1046–1058.CrossRefGoogle Scholar

Borges, C. F. (2021), Algorithm 1014: An improved algorithm for Hypot left(x,yright)\left(x,y\right)left(x,yright) , ACM Trans. Math. Softw. 47, 1–12.CrossRefGoogle Scholar

Borges, C. F., Jeannerod, C.-P. and Muller, J.-M. (2022), High-level algorithms for correctly-rounded reciprocal square roots, in 29th IEEE Symposium on Computer Arithmetic, pp. 18–25.Google Scholar

Brent, R. P. (1973), On the precision attainable with various floating-point number systems, IEEE Trans. Comput. C-22, 601–607.Google Scholar

Brent, R. P. (1978), Algorithm 524: MP, a Fortran multiple-precision arithmetic package [A1], ACM Trans. Math. Softw. 4, 71–81.CrossRefGoogle Scholar

Brent, R., Percival, C. and Zimmermann, P. (2007), Error bounds on complex floating-point multiplication, Math. Comp. 76, 1469–1481.CrossRefGoogle Scholar

Brisebarre, N. and Chevillard, S. (2007), Efficient polynomial L-approximations, in 18th IEEE Symposium on Computer Arithmetic, pp. 169–176.Google Scholar

Brisebarre, N. and Muller, J.-M. (2008), Correctly rounded multiplication by arbitrary precision constants, IEEE Trans. Comput. 57, 165–174.CrossRefGoogle Scholar

Brisebarre, N., Hanrot, G. and Robert, O. (2017), Exponential sums and correctly-rounded functions, IEEE Trans. Comput. 66, 2044–2057.CrossRefGoogle Scholar

Brisebarre, N., Joldeş, M., Muller, J.-M., Nanes, A.-M. and Picot, J. (2020), Error analysis of some operations involved in the Cooley–Tukey fast Fourier transform, ACM Trans. Math. Softw . 46, 11:1–11:27.CrossRefGoogle Scholar

Brunie, N., de Dinechin, F., Kupriianova, O. and Lauter, C. (2015), Code generators for mathematical functions, in 22nd IEEE Symposium on Computer Arithmetic, pp. 66–73.Google Scholar

Cameron, T. R. and Graillat, S. (2022), On a compensated Ehrlich–Aberth method for the accurate computation of all polynomial roots, Electron . Trans. Numer. Anal. 55, 401–423.Google Scholar

Castaldo, A. M., Whaley, R. C. and Chronopoulos, A. T. (2009), Reducing floating point error in dot product using the superblock family of algorithms, SIAM J. Sci. Comput. 31, 1156–1174.CrossRefGoogle Scholar

Ceruzzi, P. E. (1981), The early computers of Konrad Zuse, 1935 to 1945, Ann. Hist. Comput. 3, 241–262.CrossRefGoogle Scholar

Champagne, W. P. (1964), On finding roots of polynomials by hook or by crook. MSc thesis, University of Texas, Austin, TX.Google Scholar

Chevillard, S., Harrison, J., Joldeş, M. and Lauter, C. (2011), Efficient and accurate computation of upper bounds of approximation errors, Theoret . Comput. Sci. 412, 1523–1543.Google Scholar

Chevillard, S., Joldeş, M. and Lauter, C. (2010), Sollya: An environment for the development of numerical codes, in International Conference on Mathematical Software (Fukuda, K. et al., eds), Vol. 6327 of Lecture Notes in Computer Science, Springer, pp. 28–31.Google Scholar

Chung, E., Fowers, J., Ovtcharov, K., Papamichael, M., Caulfield, A., Massengill, T., Liu, M., Lo, D., Alkalay, S., Haselman, M., Abeydeera, M., Adams, L., Angepat, H., Boehn, C., Chiou, D., Firestein, O., Forin, A., Gatlin, K. S., Ghandi, M., Heil, S., Holohan, K., Husseini, A. El, Juhasz, T., Kagi, K., Kovvuri, R. K., Lanka, S., van Megen, F., Mukhortov, D., Patel, P., Perez, B., Rapsang, A., Reinhardt, S., Rouhani, B., Sapek, A., Seera, R., Shekar, S., Sridharan, B., Weisz, G., Woods, L., Xiao, P. Yi, Zhang, D., Zhao, R. and Burger, D. (2018), Serving DNNs in real time at datacenter scale with project brainwave, IEEE Micro 38, 8–20.CrossRefGoogle Scholar

Cocke, J. and Markstein, V. (1990), The evolution of RISC technology at IBM, IBM J. Res. Dev. 34, 4–11.CrossRefGoogle Scholar

Cococcioni, M., Rossi, F., Ruffaldi, E. and Saponara, S. (2022), Small reals representations for deep learning at the edge: A comparison, in Next Generation Arithmetic (Gustafson, J. and Dimitrov, V., eds), Springer, pp. 117–133.Google Scholar

Cody, W. J. and Waite, W. (1980), Software Manual for the Elementary Functions , Prentice-Hall.Google Scholar

Collange, C., Defour, D., Graillat, S. and Iakymchuk, R. (2015), Numerical reproducibility for the parallel reduction on multi- and many-core architectures, Parallel Comput. 49, 83–97.CrossRefGoogle Scholar

Connolly, M. P. and Higham, N. J. (2022), Probabilistic rounding error analysis of Householder QR factorization. MIMS EPrint 2022.5, Manchester Institute for Mathematical Sciences, The University of Manchester, UK. Available at http://eprints.maths. manchester.ac.uk/2865/.Google Scholar

Connolly, M. P., Higham, N. J. and Mary, T. (2021), Stochastic rounding and its probabilistic backward error analysis, SIAM J. Sci. Comput. 43, A566–A585.CrossRefGoogle Scholar

Connolly, M. P., Higham, N. J. and Pranesh, S. (2022), Randomized low rank matrix approximation: Rounding error analysis and a mixed precision algorithm. MIMS EPrint 2022.10, Manchester Institute for Mathematical Sciences, The University of Manchester, UK. Available at http://eprints.maths.manchester.ac.uk/2863/.Google Scholar

Cornea-Hasegan, M. A., Golliver, R. A. and Markstein, P. (1999), Correctness proofs outline for Newton–Raphson based floating-point divide and square root algorithms, in 14th IEEE Symposium on Computer Arithmetic, pp. 96–105.Google Scholar

Cornea, M., Harrison, J. and Tang, P. T. P. (2002), Scientific Computing on Itanium-based Systems, Intel Press.Google Scholar

Croci, M., Fasi, M., Higham, N. J., Mary, T. and Mikaitis, M. (2022), Stochastic rounding: implementation, error analysis and applications, Royal Soc . Open Sci. 9, 1–25.Google Scholar

Darcy, J. (2017), Restore always-strict floating-point semantics. Technical report JEP 306.Google Scholar

Daumas, M. (1999), Multiplications of floating point expansions, in 14th IEEE Symposium on Computer Arithmetic, pp. 250–257.Google Scholar

Daumas, M., Rideau, L. and Théry, L. (2001), A generic library of floating-point numbers and its application to exact computing, in 14th International Conference on Theorem Proving in Higher Order Logics (Boulton, R. J. and Jackson, P. B., eds), Vol. 2152 of Lecture Notes in Computer Science, Springer, pp. 169–184.Google Scholar

de Dinechin, F., Forget, L., Muller, J.-M. and Uguen, Y. (2019), Posits: The good, the bad and the ugly, in Conference on Next-Generation Arithmetic, ACM Press, pp. 1–10.Google Scholar

de Dinechin, F., Lauter, C. and Melquiond, G. (2011), Certifying the floating-point implementation of an elementary function using Gappa, IEEE Trans. Comput. 60, 242–253.CrossRefGoogle Scholar

Dekker, T. J. (1971), A floating-point technique for extending the available precision, Numer . Math. 18, 224–242.Google Scholar

Demmel, J. (1984), Underflow and the reliability of numerical software, SIAM J. Sci. Statist. Comput. 5, 887–919.CrossRefGoogle Scholar

Demmel, J., Ahrens, P. and Nguyen, H. D. (2016), Efficient reproducible floating point summation and BLAS. Technical report UCB/EECS-2016-121, EECS Department, University of California, Berkeley.Google Scholar

Demmel, J. and Hida, Y. (2004), Fast and accurate floating point summation with application to computational geometry, Numer . Algorithms 37, 101–112.CrossRefGoogle Scholar

Demmel, J. and Nguyen, H. D. (2015), Parallel reproducible summation, IEEE Trans. Comput. 64, 2060–2070.CrossRefGoogle Scholar

Demmel, J. and Riedy, J. (2021), A new IEEE 754 standard for floating-point arithmetic in an ever-changing world, SIAM News 54, 9.Google Scholar

Demmel, J., Dongarra, J., Gates, M., Henry, G., Langou, J., Li, X., Luszczek, P., Pereira, W., Riedy, J. and Rubio-González, C. (2022), Proposed consistent exception handling for the BLAS and LAPACK. Available at arXiv:2207.09281.CrossRefGoogle Scholar

El Arar, E.-M., Sohier, D., de Oliveira Castro, P. and Petit, E. (2022), The positive effects of stochastic rounding in numerical algorithms, in 29th IEEE Symposium on Computer Arithmetic, pp. 58–65.Google Scholar

Fabiano, N., Muller, J.-M. and Picot, J. (2019), Algorithms for triple-word arithmetic, IEEE Trans. Comput. 68, 1573–1583.CrossRefGoogle Scholar

Fasi, M., Higham, N. J., Mikaitis, M. and Pranesh, S. (2021), Numerical behavior of NVIDIA tensor cores, PeerJ Comput. Sci. 7, e330.CrossRefGoogle ScholarPubMed

Figueroa, S. A. (1995), When is double rounding innocuous?, ACM SIGNUM Newsletter 30, 21–26.CrossRefGoogle Scholar

Flegg, G., Hay, C. and Moss, B. (1985), Nicolas Chuquet, Renaissance Mathematician: A Study With Extensive Translation of Chuquet’s Mathematical Manuscript Completed in 1484 , Springer.Google Scholar

Fortune, S. and Van Wyk, C. J. (1993), Efficient exact arithmetic for computational geometry, in 9th Annual Symposium on Computational Geometry, ACM, pp. 163–172.Google Scholar

Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P. and Zimmermann, P. (2007), MPFR: A multiple-precision binary floating-point library with correct rounding, ACM Trans. Math. Softw. 33, 13–es.CrossRefGoogle Scholar

Friedland, P. (1967), Algorithm 312: Absolute value and square root of a complex number, Commun . Assoc. Comput. Mach. 10, 665.Google Scholar

Gill, S. (1951), A process for the step-by-step integration of differential equations in an automatic digital computing machine, Math. Proc. Cambridge Philos. Soc. 47, 96–108.CrossRefGoogle Scholar

Goldberg, I. B. (1967), 27 bits are not enough for 8-digit accuracy, Commun . Assoc. Comput. Mach. 10, 105–106.Google Scholar

Goualard, F. (2014), How do you compute the midpoint of an interval?, ACM Trans. Math. Softw. 40, 11:1–11:25.CrossRefGoogle Scholar

Goualard, F. (2022), Drawing random floating-point numbers from an interval, ACM Trans. Model. Comput. Simul. 32, 16:1–16:24.CrossRefGoogle Scholar

Graillat, S. and Ménissier-Morain, V. (2007), Error-free transformations in real and complex floating-point arithmetic, in 2007 International Symposium on Nonlinear Theory and its Applications, pp. 341–344.Google Scholar

Graillat, S. and Ménissier-Morain, V. (2008), Compensated Horner scheme in complex floating point arithmetic, in 8th Conference on Real Numbers and Computer, pp. 133–146.Google Scholar

Graillat, S. and Ménissier-Morain, V. (2012), Accurate summation, dot product and polynomial evaluation in complex floating-point arithmetic, Inform. Comput. 216, 57–71.CrossRefGoogle Scholar

Graillat, S., Lefèvre, V. and Muller, J.-M. (2020), Alternative split functions and Dekker’s product, in 27th IEEE Symposium on Computer Arithmetic, pp. 41–47.Google Scholar

Gregory, R. T. and Raney, J. L. (1964), Floating-point arithmetic with 84-bit numbers, Commun . Assoc. Comput. Mach. 7, 10–13.Google Scholar

Gustafson, J. L. (2015), The End of Error: Unum Computing , Chapman & Hall / CRC.Google Scholar

Hallman, E. and Ipsen, I. C. F. (2022), Precision-aware deterministic and probabilistic error bounds for floating point summation. Available at arXiv:2203.15928.Google Scholar

Harrison, J. (1999), A machine-checked theory of floating point arithmetic, in 12th International Conference in Theorem Proving in Higher Order Logics (Bertot, Y. et al., eds), Vol. 1690 of Lecture Notes in Computer Science, Springer, pp. 113–130.Google Scholar

Hauser, J. R. (1996), Handling floating-point exceptions in numeric programs, ACM Trans. Program. Lang. Syst. 18, 139–174.CrossRefGoogle Scholar

He, Y. and Ding, C. H. Q. (2000), Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications, in 14th International Conference on Supercomputing, ACM, pp. 225–234.Google Scholar

Hennessy, J. L. and Patterson, D. A. (2012), Computer Architecture: A Quantitative Approach , fifth edition, Morgan Kaufman.Google Scholar

Henry, G., Tang, P. T. P. and Heinecke, A. (2019), Leveraging the bfloat16 artificial intelligence datatype for higher-precision computations, in 26th IEEE Symposium on Computer Arithmetic, pp. 69–76.Google Scholar

Hida, Y., Li, X. S. and Bailey, D. H. (2001), Algorithms for quad-double precision floating-point arithmetic, in 15th IEEE Symposium on Computer Arithmetic, pp. 155–162.Google Scholar

Higham, N. J. (1993), The accuracy of floating point summation, SIAM J. Sci. Comput. 14, 783–799.CrossRefGoogle Scholar

Higham, N. J. (2002), Accuracy and Stability of Numerical Algorithms, second edition, SIAM.CrossRefGoogle Scholar

Higham, N. J. (2021a), The mathematics of floating-point arithmetic, LMS Newsletter 493, 35–41.Google Scholar

Higham, N. J. (2021b), Numerical stability of algorithms at extreme scale and low precisions. MIMS EPrint 2021.14, Manchester Institute for Mathematical Sciences, The University of Manchester, UK. Available at http://eprints.maths.manchester.ac.uk/id/ eprint/2833.Google Scholar

Higham, N. J. and Mary, T. (2019), A new approach to probabilistic rounding error analysis, SIAM J. Sci. Comput. 41, A2815–A2835.CrossRefGoogle Scholar

Higham, N. J. and Mary, T. (2020), Sharper probabilistic backward error analysis for basic linear algebra kernels with random data, SIAM J. Sci. Comput. 42, A3427–A3446.CrossRefGoogle Scholar

Higham, N. J. and Mary, T. (2022), Mixed precision algorithms in numerical linear algebra, Acta Numer. 31, 347–414.CrossRefGoogle Scholar

Higham, N. J. and Pranesh, S. (2019), Simulating low precision floating-point arithmetic, SIAM J. Sci. Comput. 41, C585–C602.CrossRefGoogle Scholar

Hirshfeld, A. (2009), Eureka Man: The Life and Legacy of Archimedes, Walker & Company.Google Scholar

Hull, T. E., Fairgrieve, T. F. and Tang, P. T. P. (1994), Implementing complex elementary functions using exception handling, ACM Trans. Math. Softw. 20, 215–244.CrossRefGoogle Scholar

IEEE (2015), IEEE Standard for Interval Arithmetic (IEEE Std 1788-2015), IEEE.Google Scholar

IEEE (2019), IEEE Standard for Floating-Point Arithmetic (IEEE Std 754-2019), IEEE.Google Scholar

Iffrah, G. (1999), The Universal History of Numbers: From Prehistory to the Invention of the Computer, Wiley.Google Scholar

Ikebe, Y. (1965), Note on triple-precision floating-point arithmetic with 132-bit numbers, Commun . Assoc. Comput. Mach. 8, 175–177.Google Scholar

Innocente, V. and Zimmermann, P. (2022), Accuracy of mathematical functions in single, double, extended double and quadruple precision. Available at hal-03141101.Google Scholar

International Organization for Standardization (2010), Programming Languages – Fortran – Part 1: Base language, International Standard ISO/IEC 1539-1:2010.Google Scholar

International Organization for Standardization, Geneva, Switzerland (2011), Programming Languages – C, International Standard ISO/IEC 9899:2011.Google Scholar

Ipsen, I. C. F. and Zhou, H. (2020), Probabilistic error analysis for inner products, SIAM J. Matrix Anal. Appl. 41, 1726–1741.CrossRefGoogle ScholarPubMed

Jeannerod, C.-P. (2016), A radix-independent error analysis of the Cornea–Harrison–Tang method, ACM Trans. Math. Softw. 42, 19:1–19:20.CrossRefGoogle Scholar

Jeannerod, C.-P. (2020), The relative accuracy of (x+y)*(x-y), J. Comput. Appl. Math. 369, 112613.CrossRefGoogle Scholar

Jeannerod, C.-P. and Muller, J.-M. (2017), On the relative error of computing complex square roots in floating-point arithmetic, in 51st Asilomar Conference on Signals, Systems, and Computers, IEEE, pp. 737–740.Google Scholar

Jeannerod, C.-P. and Rump, S. M. (2018), On relative errors of floating-point operations: optimal bounds and applications, Math. Comp. 87, 803–819.CrossRefGoogle Scholar

Jeannerod, C.-P., Kornerup, P., Louvet, N. and Muller, J.-M. (2017a), Error bounds on complex floating-point multiplication with an FMA, Math. Comp. 86, 881–898.CrossRefGoogle Scholar

Jeannerod, C.-P., Louvet, N. and Muller, J.-M. (2013a), Further analysis of Kahan’s algorithm for the accurate computation of 2times22\times 22times2 determinants, Math. Comp. 82, 2245–2264.CrossRefGoogle Scholar

Jeannerod, C.-P., Louvet, N. and Muller, J.-M. (2013b), On the componentwise accuracy of complex floating-point division with an FMA, in 21st IEEE Symposium on Computer Arithmetic (A. Nannarelli et al., eds), pp. 83–90.CrossRefGoogle Scholar

Jeannerod, C.-P., Louvet, N., Muller, J.-M. and Plet, A. (2016), Sharp error bounds for complex floating-point inversion, Numer . Algorithms 73, 735–760.CrossRefGoogle Scholar

Jeannerod, C.-P., Monat, C. and Thévenoux, L. (2017b), More accurate complex multiplication for embedded processors, in 12th IEEE International Symposium on Industrial Embedded Systems, pp. 1–4.CrossRefGoogle Scholar

Jeannerod, C.-P., Muller, J.-M. and Zimmermann, P. (2018), On various ways to split a floating-point number, in 25th IEEE Symposium on Computer Arithmetic, IEEE, pp. 53–60.Google Scholar

Jiang, H., Graillat, S., Barrio, R. and Yang, C. (2016), Accurate, validated and fast evaluation of elementary symmetric functions and its application, Appl. Math. Comput. 273, 1160–1178.Google Scholar

Johansson, F. (2013), Arb: A C library for ball arithmetic, ACM Commun . Comput. Algebra 47, 166–169.Google Scholar

Joldeş, M., Muller, J.-M. and Popescu, V. (2017), Tight and rigorous error bounds for basic building blocks of double-word arithmetic, ACM Trans. Math. Softw. 44, 1–27.CrossRefGoogle Scholar

Joldeş, M., Muller, J.-M., Popescu, V. and Tucker, W. (2016), CAMPARY: Cuda multiple precision arithmetic library and applications, in 5th International Congress on Mathematical Software (Greuel, G. M. et al., eds), Vol. 9725 of Lecture Notes in Computer Science, Springer, pp. 232–240.Google Scholar

Kahan, W. (1965), Pracniques: Further remarks on reducing truncation errors, Commun . Assoc. Comput. Mach. 8, 40.Google Scholar

Kahan, W. (1987), Branch cuts for complex elementary functions or much ado about nothing’s sign bit, in The State of the Art in Numerical Analysis (Iserles, A. and Powell, M. J. D., eds), Oxford University Press, pp. 165–211.Google Scholar

Kahan, W. and Thomas, J. W. (1991), Augmenting a programming language with complex arithmetic. Technical report UCB/CSD-92-667, EECS Department, University of California, Berkeley.Google Scholar

Karpinsky, R. (1985), PARANOIA: A floating-point benchmark, BYTE 10, 223.Google Scholar

Knuth, D. E. (1998), The Art of Computer Programming, Vol. 2, third edition, Addison-Wesley.Google Scholar

Kouya, T. (2019), Performance evaluation of an efficient double-double BLAS1 function with error-free transformation and its application to explicit extrapolation methods, in 26th IEEE Symposium on Computer Arithmetic, pp. 120–123.Google Scholar

Kuki, H. and Cody, W. J. (1973), A statistical study of the accuracy of floating point number systems, Commun . Assoc. Comput. Mach. 16, 223–230.Google Scholar

Kulisch, U. (1971), An axiomatic approach to rounded computations, Numer . Math. 18, 1–17.Google Scholar

Kulisch, U. (2013), Computer Arithmetic and Validity: Theory, Implementation, and Applications, Vol. 33 of Studies in Mathematics, De Gruyter.CrossRefGoogle Scholar

La Porte, M. and Vignes, J. (1974), Error analysis in computing, in Information Processing 74, North-Holland.Google Scholar

Lange, M. and Oishi, S. (2020), A note on Dekker’s FastTwoSum algorithm, Numer . Math. 145, 383–403.Google Scholar

Lange, M. and Rump, S. M. (2017), Error estimates for the summation of real numbers with application to floating-point summation, BIT Numer. Math. 57, 927–941.CrossRefGoogle Scholar

Lange, M. and Rump, S. M. (2019), Sharp estimates for perturbation errors in summations, Math. Comp. 88, 349–368.CrossRefGoogle Scholar

Lange, M. and Rump, S. M. (2020), Faithfully rounded floating-point computations, ACM Trans. Math. Softw. 46, 1–20.CrossRefGoogle Scholar

Langlois, P. and Louvet, N. (2007), How to ensure a faithful polynomial evaluation with the compensated Horner algorithm, in 18th IEEE Symposium on Computer Arithmetic, pp. 141–149.Google Scholar

Lawlor, O., Govind, H., Dooley, I., Breitenfeld, M. and Kale, L. (2005), Performance degradation in the presence of subnormal floating-point values, in International Workshop on Operating System Interference in High Performance Application.Google Scholar

Lefèvre, V. (2013), SIPE: Small Integer Plus Exponent, in 21th IEEE Symposium on Computer Arithmetic, pp. 99–106.Google Scholar

Lefèvre, V. and Muller, J.-M. (2001), Worst cases for correct rounding of the elementary functions in double precision, in 15th IEEE Symposium on Computer Arithmetic, pp. 111–118.Google Scholar

Lefèvre, V., Louvet, N., Muller, J.-M., Picot, J. and Rideau, L. (2022), Accurate calculation of Euclidean norms using double-word arithmetic, ACM Trans. Math. Softw. https://doi.org/10.1145/3568672.CrossRefGoogle Scholar

Li, X., Demmel, J., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kapur, A., Martin, M., Tung, T. and Yoo, D. J. (2000), Design, implementation and testing of extended and mixed precision BLAS. Technical report 45991, Lawrence Berkeley National Laboratory. Available at https://netlib.org/lapack/lawnspdf/lawn149.pdf.Google Scholar

Lichtenau, C., Buyuktosunoglu, A., Bertran, R., Figuli, P., Jacobi, C., Papandreou, N., Pozidis, H., Saporito, A., Sica, A. and Tzortzatos, E. (2022), AI accelerator on IBM Telum processor: Industrial product, in 49th ACM International Symposium on Computer Architecture, ACM, pp. 1012–1028.Google Scholar

Lohner, R. J. (2001), On the ubiquity of the wrapping effect in the computation of error bounds, in Perspectives on Enclosure Methods (Kulisch, U. et al., eds), Springer, pp. 201–216.Google Scholar

Lynch, T. and Swartzlander, E. (1992), A formalization for computer arithmetic, in Computer Arithmetic and Enclosure Methods (Atanassova, L. and Hertzberger, J., eds), Elsevier Science, pp. 137–145.Google Scholar

Malcolm, M. A. (1971), On accurate floating-point summation, Commun . Assoc. Comput. Mach. 14, 731–736.Google Scholar

Markstein, P. (1990), Computation of elementary functions on the IBM RISC System/6000 processor, IBM J. Res. Dev. 34, 111–119.CrossRefGoogle Scholar

Matula, D. W. (1968), In-and-out conversions, Commun . Assoc. Comput. Mach. 11, 47–50.Google Scholar

Melquiond, G. (2019), Formal verification for numerical computations, and the other way around. Habilitation à Diriger des Recherches, Université Paris Sud, Orsay.Google Scholar

Mezzarobba, M. (2010), NumGfun: A package for numerical and analytic computation with D-finite functions, in Proceedings of the 2010 International Symposium on Symbolic and Algebraic Computation, ACM, pp. 139–145.Google Scholar

Micikevicius, P., Stosic, D., Burgess, N., Cornea, M., Dubey, P., Grisenthwaite, R., Ha, S., Heinecke, A., Judd, P., Kamalu, J., Mellempudi, N., Oberman, S., Shoeybi, M., Siu, M. and H, W (2022), FP8 formats for deep learning. Available at https://paperswithcode.com/ paper/fp8-formats-for-deep-learning.Google Scholar

Monniaux, D. (2008), The pitfalls of verifying floating-point computations, ACM Trans. Program. Lang. Syst. 30, 1–41.CrossRefGoogle Scholar

Moore, J. S., Lynch, T. and Kaufmann, M. (1998), A mechanically checked proof of the correctness of the kernel of the AMD5K86 floating point division algorithm, IEEE Trans. Comput. 47, 913–926.CrossRefGoogle Scholar

Moore, R. E. (1979), Methods and Applications of Interval Analysis, SIAM Studies in Applied Mathematics, SIAM.Google Scholar

Moore, R. E., Kearfott, R. B. and Cloud, M. J. (2009), Introduction to Interval Analysis, SIAM.CrossRefGoogle Scholar

Muller, J.-M. (2015), On the error of computing ab+cdab+ cdab+cd using Cornea, Harrison and Tang’s method, ACM Trans. Math. Softw. 41, 7:1–7:8.CrossRefGoogle Scholar

Muller, J.-M. (2016), Elementary Functions, Algorithms and Implementation, third edition, Birkhäuser.Google Scholar

Muller, J.-M. and Rideau, L. (2022), Formalization of double-word arithmetic, and comments on ‘Tight and rigorous error bounds for basic building blocks of double-word arithmetic’, ACM Trans. Math. Softw. 48, 1–24.CrossRefGoogle Scholar

Muller, J.-M., Brunie, N., de Dinechin, F., Jeannerod, C.-P., Joldeş, M., Lefèvre, V., Melquiond, G., Revol, N. and Torres, S. (2018), Handbook of Floating-Point Arithmetic, second edition, Birkhäuser.CrossRefGoogle Scholar

Neumaier, A. (1974), Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen, ZAMM 54, 39–51. In German.Google Scholar

Neumaier, A. (1990), Interval Methods for Systems of Equations, Cambridge University Press.Google Scholar

Nievergelt, Y. (2003), Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit, ACM Trans. Math. Softw. 29, 27–48.CrossRefGoogle Scholar

Noune, B., Jones, P., Justus, D., Masters, D. and Luschi, C. (2022), 8-bit numerical formats for deep neural networks. Available at arXiv:2206.02915.Google Scholar

Ogita, T., Rump, S. M. and Oishi, S. (2005), Accurate sum and dot product, SIAM J. Sci. Comput. 26, 1955–1988.CrossRefGoogle Scholar

Olver, F. W. J. (1983), Error analysis of complex arithmetic, in Computational Aspects of Complex Analysis, Vol. 102 of NATO Science Series C, D. Reidel, pp. 279–292.Google Scholar

Osorio, J., Armejach, A., Petit, E., Henry, G. and Casas, M. (2022), A BF16 FMA is all you need for DNN training, IEEE Trans. Emerg. Topics Comput. 10, 1302–1314.CrossRefGoogle Scholar

Ozaki, K., Ogita, T. and Mukunoki, D. (2021), Interval matrix multiplication using fast low-precision arithmetic on GPU, in 9th International Workshop on Reliable Engineering Computing, pp. 419–434.Google Scholar

Ozaki, K., Ogita, T., Oishi, S. and Rump, S. M. (2012), Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications, Numer . Algorithms 59, 95–118.CrossRefGoogle Scholar

Parker, D. S., Pierce, B. and Eggert, P. R. (2000), Monte Carlo arithmetic: How to gamble with floating point and win, Comput. Sci. Engng 2, 58–68.CrossRefGoogle Scholar

Pichat, M. (1972), Correction d’une somme en arithmétique à virgule flottante, Numer . Math. 19, 400–406. In French .CrossRefGoogle Scholar

Pichat, M. (1976), Contribution à l’étude des erreurs d’arrondi en arithmétique à virgule flottante. PhD thesis, Université Scientifique et Médicale de Grenoble & Institut National Polytechnique de Grenoble.Google Scholar

Pion, S. (1999), De la géométrie algorithmique au calcul géométrique. PhD dissertation, Université Nice Sophia Antipolis.Google Scholar

Popescu, V. (2017), Towards fast and certified multiple-precision librairies. PhD dissertation, Université de Lyon, no. 2017LYSEN036.Google Scholar

Priest, D. M. (1991), Algorithms for arbitrary precision floating point arithmetic, in 10th IEEE Symposium on Computer Arithmetic, pp. 132–143.Google Scholar

Priest, D. M. (1992), On properties of floating-point arithmetics: Numerical stability and the cost of accurate computations. PhD thesis, University of California at Berkeley.Google Scholar

Priest, D. M. (2004), Efficient scaling for complex division, ACM Trans. Math. Softw. 30, 389–401.CrossRefGoogle Scholar

Revol, N. and Rouillier, F. (2005), Motivations for an arbitrary precision interval arithmetic and the MPFI library, Reliable Computing 11, 275–290.CrossRefGoogle Scholar

Riedy, E. J. and Demmel, J. (2018), Augmented arithmetic operations proposed for IEEE-754 2018, in 25th IEEE Symposium on Computer Arithmetic, pp. 45–52.Google Scholar

Roux, P. (2014), Innocuous double rounding of basic arithmetic operations, J. Formal. Reasoning 7, 131–142.Google Scholar

Rump, S. M. (2009), Ultimately fast accurate summation, SIAM J. Sci. Comput. 31, 3466–3502.CrossRefGoogle Scholar

Rump, S. M. (2010), Verification methods: Rigorous results using floating-point arithmetic, Acta Numer. 19, 287–449.CrossRefGoogle Scholar

Rump, S. M. (2012), Error estimation of floating-point summation and dot product, BIT Numer. Math. 52, 201–220.CrossRefGoogle Scholar

Rump, S. M. (2015), Computable backward error bounds for basic algorithms in linear algebra, Nonlinear Theory Appl . IEICE 6, 360–363.Google Scholar

Rump, S. M. (2017), IEEE754 precision-k base-β arithmetic inherited by precision-m base-β arithmetic for k<mk<mk<m , ACM Trans. Math. Softw. 43, 20:1–20:15.Google Scholar

Rump, S. M. (2019), Error bounds for computer arithmetics, in 26th IEEE Symposium on Computer Arithmetic, pp. 1–14.Google Scholar

Rump, S. M., Ogita, T. and Oishi, S. (2008), Accurate floating-point summation, I: Faithful rounding, SIAM J. Sci. Comput. 31, 189–224.CrossRefGoogle Scholar

Rump, S. M., Zimmermann, P., Boldo, S. and Melquiond, G. (2009), Computing predecessor and successor in rounding to nearest, BIT Numer. Math. 49, 419–431.CrossRefGoogle Scholar

Shewchuk, J. R. (1997), Adaptive precision floating-point arithmetic and fast robust geometric predicates, Discrete Comput. Geom. 18, 305–363.CrossRefGoogle Scholar

Shibata, N. and Petrogalli, F. (2020), SLEEF: A portable vectorized library of C standard mathematical functions, IEEE Trans. Parallel Distrib. Syst. 31, 1316–1327.CrossRefGoogle Scholar

Sibidanov, A., Zimmermann, P. and Glondu, S. (2022), The CORE-MATH project, in 29th IEEE Symposium on Computer Arithmetic, pp. 26–34.Google Scholar

Smith, R. L. (1962), Algorithm 116: Complex division, Commun . Assoc. Comput. Mach. 5, 435.Google Scholar

Steele, G. L. Jr and White, J. L. (2004), Retrospective: How to print floating-point numbers accurately, ACM SIGPLAN Notices 39, 372–389.CrossRefGoogle Scholar

Sterbenz, P. H. (1974), Floating-Point Computation, Prentice-Hall.Google Scholar

Sun, X., Wang, N., Chen, C.-Y., Ni, J., Agrawal, A., Cui, X., Venkataramani, S., Maghraoui, K. El, Srinivasan, V. V. and Gopalakrishnan, K. (2020), Ultra-low precision 4-bit training of deep neural networks, in Advances in Neural Information Processing Systems 33 (Larochelle, H. et al., eds), Curran Associates, pp. 1796–1807.Google Scholar

Swartzlander, E. E. and Alexpoulos, A. G. (1975), The sign-logarithm number system, IEEE Trans. Comput. Reprinted in E. E. Swartzlander, Computer Arithmetic, Vol. 1, IEEE, 1990.Google Scholar

Veltkamp, G. W. (1968), ALGOL procedures voor het berekenen van een inwendig product in dubbele precisie. Technical report 22, RC-Informatie, Technische Hogeschool Eindhoven.Google Scholar

Veltkamp, G. W. (1969), ALGOL procedures voor het rekenen in dubbele lengte. Technical report 21, RC-Informatie, Technische Hogeschool Eindhoven.Google Scholar

Whaley, R. C., Petitet, A. and Dongarra, J. J. (2001), Automated empirical optimizations of software and the ATLAS project, Parallel Comput. 27, 3–35.CrossRefGoogle Scholar

Wilkinson, J. H. (1960), Error analysis of floating-point computation, Numer . Math. 2, 319–340.Google Scholar

Wilkinson, J. H. (1961), Error analysis of direct methods of matrix inversion, J. Assoc. Comput. Mach. 8, 281–330.CrossRefGoogle Scholar

Wilkinson, J. H. (1963), Rounding Errors in Algebraic Processes, Notes on Applied Science no. 32, HMSO. Also published by Prentice-Hall. Reprinted by Dover, 1994.Google Scholar

Wilkinson, J. H. (1965), The Algebraic Eigenvalue Problem, Oxford University Press.Google Scholar

Wolfe, J. M. (1964), Reducing truncation errors by programming, Commun . Assoc. Comput. Mach. 7, 355–356.Google Scholar

Yamazaki, I., Tomov, S. and Dongarra, J. (2015), Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs, SIAM J. Sci. Comput. 37, C307–C330.CrossRefGoogle Scholar

Ziv, A. (1999), Sharp ULP rounding error bound for the hypotenuse function, Math. Comp. 68, 1143–1148.CrossRefGoogle Scholar