diff --git a/COPYING.txt b/COPYING.txt index e4c3cc9..75ac327 100644 --- a/COPYING.txt +++ b/COPYING.txt @@ -2,7 +2,7 @@ License for Berkeley SoftFloat Release 3 John R. Hauser -2014 ________ +2014 Dec 17 The following applies to the whole of SoftFloat Release 3 as well as to each source file individually. diff --git a/README.html b/README.html index 6f9eadd..c0d5eb1 100644 --- a/README.html +++ b/README.html @@ -11,7 +11,7 @@
John R. Hauser
-2014 ________
+2014 Dec 17
@@ -19,7 +19,8 @@ Berkeley SoftFloat is a software implementation of binary floating-point that
conforms to the IEEE Standard for Floating-Point Arithmetic.
SoftFloat is distributed in the form of C source code.
Building the SoftFloat sources generates a library file (typically
-"softfloat.a"
) containing the floating-point subroutines.
+softfloat.a
or libsoftfloat.a
) containing the
+floating-point subroutines.
diff --git a/README.txt b/README.txt index beafe32..3deb817 100644 --- a/README.txt +++ b/README.txt @@ -2,13 +2,13 @@ Package Overview for Berkeley SoftFloat Release 3 John R. Hauser -2014 ________ +2014 Dec 17 Berkeley SoftFloat is a software implementation of binary floating-point that conforms to the IEEE Standard for Floating-Point Arithmetic. SoftFloat is distributed in the form of C source code. Building the SoftFloat sources -generates a library file (typically "softfloat.a") containing the floating- -point subroutines. +generates a library file (typically "softfloat.a" or "libsoftfloat.a") +containing the floating-point subroutines. The SoftFloat package is documented in the following files in the "doc" subdirectory: diff --git a/doc/SoftFloat-history.html b/doc/SoftFloat-history.html index b78f658..65e3ca2 100644 --- a/doc/SoftFloat-history.html +++ b/doc/SoftFloat-history.html @@ -11,11 +11,7 @@
John R. Hauser
-2014 _____
-
-*** CONTENT DONE.
+2014 Dec 17
John R. Hauser
-2014 _____
-
-*** REPLACE QUOTATION MARKS.
+2014 Dec 17
-*** CHECK.
-*** FIX FORMATTING.
-
- Introduction - Limitations - Acknowledgments and License - SoftFloat Package Directory Structure - Issues for Porting SoftFloat to a New Target - Standard Headers <stdbool.h> and <stdint.h> - Specializing Floating-Point Behavior - Macros for Build Options - Adapting a Template Target Directory - Target-Specific Optimization of Primitive Functions - Testing SoftFloat - Providing SoftFloat as a Common Library for Applications - Contact Information -+
++
++ + + 1. Introduction + 2. Limitations + 3. Acknowledgments and License + 4. SoftFloat Package Directory Structure + 5. Issues for Porting SoftFloat to a New Target + ++ 5.1. Standard Headers +<stdbool.h>
and +<stdint.h>
+ 5.2. Specializing Floating-Point Behavior + 5.3. Macros for Build Options + 5.4. Adapting a Template Target Directory + +5.5. Target-Specific Optimization of Primitive Functions ++ 6. Testing SoftFloat + +7. Providing SoftFloat as a Common Library for Applications ++ 8. Contact Information
+
+ + - + Par Lab: + Par Lab: Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery (Award #DIG07-10227), with additional support from Par Lab affiliates Nokia, @@ -126,7 +134,8 @@ NVIDIA, Oracle, and Samsung. - + ASPIRE Lab: + ASPIRE Lab: DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA, @@ -185,27 +194,40 @@ Because SoftFloat is targeted to multiple platforms, its source code is slightly scattered between target-specific and target-independent directories and files. The supplied directory structure is as follows: + The majority of the SoftFloat sources are provided in the- doc - source - include - 8086 - build - template-FAST_INT64 - template-not-FAST_INT64 - Linux-386-GCC - Linux-x86_64-GCC - Win32-MinGW - Win64-MinGW-w64 +doc +source + include + 8086 + 8086-SSE +build + template-FAST_INT64 + template-not-FAST_INT64 + Linux-386-GCC + Linux-386-SSE2-GCC + Linux-x86_64-GCC + Win32-MinGW + Win32-SSE2-MinGW + Win64-MinGW-w64+source
directory. Theinclude
subdirectory ofsource
contains several -header files (unsurprisingly), while the8086
subdirectory -contains source files that specialize the floating-point behavior to match the -Intel x86 line of processors. +header files (unsurprisingly), while the8086
and +subdirectories contain source files that +specialize the floating-point behavior to match the Intel x86 line of +processors. +The files in directory 8086-SSE
8086
give floating-point behavior +consistent solely with Intel’s older, 8087-derived floating-point, while +those inupdate the behavior of the +non-extended formats ( 8086-SSE
float32_t
,float64_t
, and +float128_t
) to mirror Intel’s more recent Streaming SIMD +Extensions (SSE) and other compatible extensions. If other specializations are attempted, these would be expected to be other -subdirectories ofsource
alongside8086
. +subdirectories ofsource
alongside8086
and +. Specialization is covered later, in 8086-SSE
section 5.2 , Specializing Floating-Point Behavior. @@ -213,9 +235,9 @@ Floating-Point Behavior.The
build
directory is intended to contain a subdirectory for each target platform for which a build of the SoftFloat library may be created. -For each build target, the target's subdirectory is where all derived object -files and the completed SoftFloat library (typicallysoftfloat.a
-orlibsoftfloat.a
) are created. +For each build target, the target’s subdirectory is where all derived +object files and the completed SoftFloat library (typically +softfloat.a
orlibsoftfloat.a
) are created. The twotemplate
subdirectories are not actual build targets but contain sample files for creating new target directories. (The meaning ofFAST_INT64
will be explained later.) @@ -227,18 +249,21 @@ are intended to follow a naming system of. For the example targets, <execution-environment>-<compiler>
is - <execution-environment>
, Linux-386
, - Linux-x86_64
Win32
, orWin64
, and +, Linux-386
, + Linux-386-SSE2
, Linux-x86_64
Win32
, +, or Win32-SSE2
Win64
, andis <compiler>
GCC
,MinGW
, or. MinGW-w64
As supplied, each target directory contains two files: +
The provided- Makefile - platform.h +Makefile +platform.h+Makefile
is written for GNUmake
. A build of SoftFloat for the specific target is begun by executing themake
command with the target directory as the current directory. @@ -258,10 +283,10 @@ desirable to include in headerplatform.h
(directly or via#include
) declarations for numerous target-specific optimizations. Such possibilities are discussed in the next section, Issues for Porting SoftFloat to a New Target. -If the target's compiler or library has bugs or other shortcomings, workarounds -for these issues may also be possible with target-specific declarations in -platform.h
, avoiding the need to modify the main SoftFloat -sources. +If the target’s compiler or library has bugs or other shortcomings, +workarounds for these issues may also be possible with target-specific +declarations inplatform.h
, avoiding the need to modify the main +SoftFloat sources. @@ -280,30 +305,34 @@ For older or nonstandard compilers, substitutes for<stdbool.h>
and<stdint.h>
may need to be created. SoftFloat depends on these names from<stdbool.h>
: +and on these names from- bool - true - false +bool +true +false+<stdint.h>
: +@@ -312,12 +341,12 @@ and on these names from- uint16_t - uint32_t - uint64_t - int32_t - int64_t - UINT64_C - INT64_C - uint_least8_t - uint_fast8_t - uint_fast16_t - uint_fast32_t - uint_fast64_t - int_fast8_t - int_fast16_t - int_fast32_t - int_fast64_t +uint16_t +uint32_t +uint64_t +int32_t +int64_t +UINT64_C +INT64_C +uint_least8_t +uint_fast8_t +uint_fast16_t +uint_fast32_t +uint_fast64_t +int_fast8_t +int_fast16_t +int_fast32_t +int_fast64_t+<stdint.h>
:The IEEE Floating-Point Standard allows for some flexibility in a conforming implementation, particularly concerning NaNs. -The SoftFloat
source
directory is supplied with one or more +The SoftFloatsource
directory is supplied with some specialization subdirectories containing possible definitions for this implementation-specific behavior. -For example, the8086
subdirectory has source files that -specialize SoftFloat's behavior to match that of Intel's x86 line of -processors. +For example, the8086
and+subdirectories have source files that specialize SoftFloat’s behavior to +match that of Intel’s x86 line of processors. The files in a specialization subdirectory must determine: 8086-SSE
- @@ -343,8 +372,9 @@ source files are needed to complete the specialization.
-A new build target may use an existing specialization, such as the one provided -by the
8086
subdirectory. +A new build target may use an existing specialization, such as the ones +provided by the8086
and+subdirectories. If a build target needs a new specialization, different from any existing ones, it is recommended that a new specialization subdirectory be created in the 8086-SSE
source
directory for this purpose. @@ -367,18 +397,18 @@ Must be defined for little-endian machines; must not be defined for big-endian machines.SOFTFLOAT_FAST_INT64
- -Can be defined to indicate that the build target's implementation of -
64-bit
arithmetic is efficient. -For newer64-bit
processors, this macro should usually be defined. -For very small microprocessors whose buses and registers are8-bit
-or16-bit
in size, this macro should usually not be defined. -Whether this macro should be defined for a32-bit
processor may +Can be defined to indicate that the build target’s implementation of +64-bit arithmetic is efficient. +For newer64-bit processors, this macro should usually be defined. +For very small microprocessors whose buses and registers are8-bit +or16-bit in size, this macro should usually not be defined. +Whether this macro should be defined for a32-bit processor may depend on the target machine and the applications that will use SoftFloat.SOFTFLOAT_FAST_DIV64TO32
- -Can be defined to indicate that the target's division operator +Can be defined to indicate that the target’s division operator
in C (written as/
) is reasonably efficient for -dividing a64-bit
unsigned integer by a32-bit
+dividing a64-bit unsigned integer by a32-bit unsigned integer. Setting this macro may affect the performance of division, remainder, and square root operations. @@ -411,16 +441,16 @@ defined toextern
inline
. Following the usual customfor C , for the first three macros (all exceptINLINE_LEVEL
andINLINE
), the content of any definition is irrelevant; -what matters is a macro's effect on#ifdef
directives. +what matters is a macro’s effect on#ifdef
directives.It is recommended that any definitions of macros
@@ -433,8 +463,9 @@ Two different templates exist because different functions are needed in the SoftFloat library depending on whether macroLITTLEENDIAN
and -INLINE
be made in a build target'splatform.h
header -file, because these macros are expected to be determined inflexibly by the -target machine and compiler. +INLINE
be made in a build target’splatform.h
+header file, because these macros are expected to be determined inflexibly by +the target machine and compiler. The other three macros control optimization and might be better located in the -target's Makefile (or its equivalent). +target’s Makefile (or its equivalent).SOFTFLOAT_FAST_INT64
is defined. If macroSOFTFLOAT_FAST_INT64
will be defined, -template-FAST_INT64
is the template to use; -otherwise,template-not-FAST_INT64
is the appropriate template. +is the template to use; +otherwise, template-FAST_INT64
is the appropriate +template. A new target directory can be created by copying the correct template directory and editing the files inside. To avoid confusion, it would be wise to refrain from editing the files within a @@ -447,12 +478,12 @@ template directory directly. template-not-FAST_INT64
Header file
@@ -461,7 +492,7 @@ declared inprimitives.h
(in directorysource/include
) declares macros and functions for numerous -underlying arithmetic operations upon which many of SoftFloat's floating-point -functions are ultimately built. +underlying arithmetic operations upon which many of SoftFloat’s +floating-point functions are ultimately built. The SoftFloat sources include implementations of all of these functions/macros, written as standard C code, so a complete and correct SoftFloat library can be built using only the supplied code for all functions. -However, for many targets, SoftFloat's performance can be improved by +However, for many targets, SoftFloat’s performance can be improved by substituting target-specific implementations of some of the functions/macros declared inprimitives.h
.primitives.h
. For example,primitives.h
declares a function calledsoftfloat_countLeadingZeros32
that takes an unsigned32-bit integer as an argument and returns the maximal number of -the integer's most-significant bits that are all zeros. +the integer’s most-significant bits that are all zeros. While the SoftFloat sources include an implementation of this function written instandard C , many processors can perform this same function directly in only one or two machine instructions. @@ -473,19 +504,22 @@ package.A build target can replace the supplied version of any function or macro of
primitives.h
by defining a macro with the same name in the -target'splatform.h
header file. +target’splatform.h
header file. For this purpose, it may be helpful forplatform.h
to#include
header fileprimitiveTypes.h
, which defines types used for arguments and results of functions declared inprimitives.h
. When a desired replacement implementation is a function, not a macro, it is sufficient forplatform.h
to include the line ++where- #define <function-name> <function-name> +#define <function-name> <function-name>-where<function-name>
is the name of the function. -This technically defines<function-name>
as a macro, but one -that resolves to the same name, which may then be a function. +is the name of the +function. +This technically defines <function-name>
as a +macro, but one that resolves to the same name, which may then be a function. (A preprocessor conforming to the C Standard must limit recursive macro expansion from being applied more than once.) @@ -500,46 +534,34 @@ This program is part of the Berkeley TestFloat package available at the Web page <function-name>
http://www.jhauser.us/arithmetic/TestFloat.html
. The TestFloat package also has a program calledtimesoftfloat
that -measures the speed of SoftFloat's floating-point functions. +measures the speed of SoftFloat’s floating-point functions.7. Providing SoftFloat as a Common Library for Applications
-Supplied
softfloat.h
depends onsoftfloat_types.h
. +Header filesoftfloat.h
defines the SoftFloat interface as seen by +clients. +If the SoftFloat library will be made a common library for programs on a +particular system, the suppliedsoftfloat.h
has a couple of +deficiencies for this purpose: ++
+In the situation that new programs may regularly- +As supplied,
softfloat.h
depends on another header, +softfloat_types.h
, that is not intended for public use but which +must also be visible to the programmer’s compiler. +- +More troubling, at the time
softfloat.h
is included in a C +source file, macroSOFTFLOAT_FAST_INT64
must be defined, or not +defined, consistent with whether this macro was defined when the SoftFloat +library was built. +#include
header +filesoftfloat.h
, it is recommended that a custom, self-contained +version of this header file be created that eliminates these issues. --The target-specific `softfloat.h' header file defines the SoftFloat -interface as seen by clients. - -Unlike the actual function definitions in `softfloat.c', the declarations -in `softfloat.h' do not use any of the types defined by the `processors' -header file. This is done so that clients will not have to include the -`processors' header file in order to use SoftFloat. Nevertheless, the -target-specific declarations in `softfloat.h' must match what `softfloat.c' -expects. For example, if `int32' is defined as `int' in the `processors' -header file, then in `softfloat.h' the output of `float32_to_int32' should -be stated as `int', although in `softfloat.c' it is given in target- -independent form as `int32'. -- --*** HERE - -Porting and/or compiling SoftFloat involves the following steps: - -4. In the target-specific subdirectory, edit the files `softfloat-specialize' - and `softfloat.h' to define the desired exception handling functions - and mode control values. In the `softfloat.h' header file, ensure also - that all declarations give the proper target-specific type (such as - `int' or `long') corresponding to the target-independent type used in - `softfloat.c' (such as `int32'). None of the type names declared in the - `processors' header file should appear in `softfloat.h'. - --8. Contact Information
diff --git a/doc/SoftFloat.html b/doc/SoftFloat.html index fa3919a..d406d91 100644 --- a/doc/SoftFloat.html +++ b/doc/SoftFloat.html @@ -11,66 +11,59 @@John R. Hauser
- -
-2014 ______
--*** CONTENT DONE. -
- --*** REPLACE QUOTATION MARKS. -
-*** REPLACE APOSTROPHES. -
-*** REPLACE EM DASH. +2014 Dec 17
Contents
--*** CHECK.
- -
-*** FIX FORMATTING. -- Introduction - Limitations - Acknowledgments and License - Types and Functions - Boolean and Integer Types - Floating-Point Types - Supported Floating-Point Functions - Non-canonical Representations in extFloat80_t - Conventions for Passing Arguments and Results - Reserved Names - Mode Variables - Rounding Mode - Underflow Detection - Rounding Precision for 80-Bit Extended Format - Exceptions and Exception Flags - Function Details - Conversions from Integer to Floating-Point - Conversions from Floating-Point to Integer - Conversions Among Floating-Point Types - Basic Arithmetic Functions - Fused Multiply-Add Functions - Remainder Functions - Round-to-Integer Functions - Comparison Functions - Signaling NaN Test Functions - Raise-Exception Function - Changes from SoftFloat Release 2 - Name Changes - Changes to Function Arguments - Added Capabilities - Better Compatibility with the C Language - New Organization as a Library - Optimization Gains (and Losses) - Future Directions - Contact Information -+++
++ + + 1. Introduction + 2. Limitations + 3. Acknowledgments and License + 4. Types and Functions + 4.1. Boolean and Integer Types + 4.2. Floating-Point Types + 4.3. Supported Floating-Point Functions + ++ 4.4. Non-canonical Representations in +extFloat80_t
+ 4.5. Conventions for Passing Arguments and Results + 5. Reserved Names + 6. Mode Variables + 6.1. Rounding Mode + 6.2. Underflow Detection + ++ 6.3. Rounding Precision for the +80-Bit Extended Format+ 7. Exceptions and Exception Flags + 8. Function Details + 8.1. Conversions from Integer to Floating-Point + 8.2. Conversions from Floating-Point to Integer + 8.3. Conversions Among Floating-Point Types + 8.4. Basic Arithmetic Functions + 8.5. Fused Multiply-Add Functions + 8.6. Remainder Functions + 8.7. Round-to-Integer Functions + 8.8. Comparison Functions + 8.9. Signaling NaN Test Functions + 8.10. Raise-Exception Function + 9. Changes from SoftFloat Release 2 + 9.1. Name Changes + 9.2. Changes to Function Arguments + 9.3. Added Capabilities + 9.4. Better Compatibility with the C Language + 9.5. New Organization as a Library + 9.6. Optimization Gains (and Losses) + 10. Future Directions + 11. Contact Information 1. Introduction
@@ -156,15 +149,20 @@ SoftFloatRelease 3 . The SoftFloat package was written by me,John R. Hauser.Release 3 of SoftFloat is a completely new implementation supplanting earlier releases. -This project was done in the employ of the University of California, Berkeley, -within the Department of Electrical Engineering and Computer Sciences, first -for the Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab. +This project (Release 3 only, not earlier releases) was done in +the employ of the University of California, Berkeley, within the Department of +Electrical Engineering and Computer Sciences, first for the Parallel Computing +Laboratory (Par Lab) and then for the ASPIRE Lab. The work was officially overseen by Prof. Krste Asanovic, with funding provided by these sources:+
+ + - + Par Lab: + Par Lab: Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery (Award #DIG07-10227), with additional support from Par Lab affiliates Nokia, @@ -172,7 +170,8 @@ NVIDIA, Oracle, and Samsung. - + ASPIRE Lab: + ASPIRE Lab: DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA, @@ -245,16 +244,18 @@ for these headers. Header softfloat.h
depends only on the namebool
from<stdbool.h>
and on these type names from<stdint.h>
: +@@ -263,26 +264,22 @@ Header- uint16_t - uint32_t - uint64_t - int32_t - int64_t - uint_fast8_t - uint_fast32_t - uint_fast64_t +uint16_t +uint32_t +uint64_t +int32_t +int64_t +uint_fast8_t +uint_fast32_t +uint_fast64_t+softfloat.h
depends only on the namebool
fromThe
softfloat.h
header defines four floating-point types:-+
@@ -304,10 +301,10 @@ Header file
- float32_t
32-bit single-precision binary format- float64_t
64-bit double-precision binary format- - extFloat80_t
+ extFloat80_t
80-bit double-extended-precision binary format (old Intel or Motorola format)- float128_t
128-bit quadruple-precision binary formatsoftfloat.h
also defines a structure, This structure is the same size as typeextFloat80_t
and contains at least these two fields (not necessarily in this order):-Field-
+- uint16_t signExp;
- uint64_t signif;
+uint16_t signExp; +uint64_t signif; +signExp
contains the sign and exponent of the floating-point value, with the sign in the most significant bit (bit 15 ) and the @@ -339,8 +336,8 @@ operation defined by the IEEE Standard; for each format, the floating-point remainder operation defined by the IEEE Standard;- -for each format, a ``round to integer'' operation that rounds to the nearest -integer value in the same format; and +for each format, a “round to integer” operation that rounds to the +nearest integer value in the same format; and
- comparisons between two values in the same floating-point format. @@ -357,12 +354,12 @@ not supported in SoftFloat
Release 3 : conversions between floating-point formats and decimal or hexadecimal character sequences;- -all ``quiet-computation'' operations (copy, negate, abs, -and copySign, which all involve only simple copying and/or manipulation -of the floating-point sign bit); and +all “quiet-computation” operations (copy, negate, +abs, and copySign, which all involve only simple copying and/or +manipulation of the floating-point sign bit); and
- -all ``non-computational'' operations other than isSignaling (which is -supported). +all “non-computational” operations other than isSignaling +(which is supported). @@ -393,9 +390,9 @@ leading significand bit must
be 1 unless it is required to ForRelease 3 of SoftFloat, functions are not guaranteed to operate as expected when inputs of typeextFloat80_t
are non-canonical. -Assuming all of a function'sextFloat80_t
inputs (if any) are -canonical, function outputs of typeextFloat80_t
will always be -canonical. +Assuming all of a function’sextFloat80_t
inputs (if any) +are canonical, function outputs of typeextFloat80_t
will always +be canonical.4.5. Conventions for Passing Arguments and Results
@@ -426,8 +423,8 @@ SoftFloat supplies this function: The first two arguments point to the values to be added, and the last argument points to the location where the sum will be stored. TheM
in the namef128M_add
is mnemonic for the fact -that the128-bit inputs and outputs are ``in memory'', pointed to -by pointer arguments. +that the128-bit inputs and outputs are “in memory”, +pointed to by pointer arguments.@@ -464,10 +461,11 @@ platforms of interest, programmers can use whichever version they prefer.
In addition to the variables and functions documented here, SoftFloat defines some symbol names for its own private use. -These private names always begin with the prefix `
@@ -477,7 +475,7 @@ prefix, and should reference only such names as are documented.softfloat_
'. +These private names always begin with the prefix +‘softfloat_
’. When a program includes headersoftfloat.h
or links with the -SoftFloat library, all names with prefix `softfloat_
' are reserved -for possible use by SoftFloat. +SoftFloat library, all names with prefix ‘softfloat_
’ +are reserved for possible use by SoftFloat. Applications that use SoftFloat should not define their own names with this prefix, and should reference only such names as are documented.The following variables control rounding mode, underflow detection, and the -
80-bit extended format's rounding precision: +80-bit extended format’s rounding precision:This variable may be set to one of the valuessoftfloat_roundingMode
softfloat_detectTininess
@@ -497,30 +495,25 @@ The rounding mode is selected by the global variable-+
@@ -550,7 +543,7 @@ Like most systems (and as required by the newer 2008 IEEE Standard), SoftFloat always detects loss of accuracy for underflow as an inexact result. -
- softfloat_round_near_even
round to nearest, with ties to even - - softfloat_round_near_maxMag
+ softfloat_round_near_maxMag
round to nearest, with ties to maximum magnitude (away from zero) - softfloat_round_minMag
round to minimum magnitude (toward zero) - softfloat_round_min
round to minimum (down) - softfloat_round_max
round to maximum (up) 6.3. Rounding Precision for 80-Bit Extended Format
+6.3. Rounding Precision for the
80-Bit Extended FormatFor
extFloat80_t
only, the rounding precision of the basic @@ -639,7 +632,7 @@ It does always raise the inexact exception flag as required. In this section,<float>
appears in function names as a substitute for one of these abbreviations:-+
@@ -1296,7 +1316,7 @@ argument
f32
indicates @@ -696,11 +689,14 @@ Each conversion function takes one input of the appropriate type and generates one output. The following illustrates the signatures of these functions in cases when the floating-point result is passed either by value or via pointers: +float32_t
, passed by value- float64_t i32_to_f64( int32_t a ); - - void i32_to_f128M( int32_t a, float128_t *destPtr ); +float64_t i32_to_f64( int32_t a );++void i32_to_f128M( int32_t a, float128_t *destPtr ); ++8.2. Conversions from Floating-Point to Integer
@@ -717,12 +713,15 @@ functions: The functions have signatures as follows, depending on whether the floating-point input is passed by value or via pointers: +The- int32_t f64_to_i32( float64_t a, uint_fast8_t roundingMode, bool exact ); - - int32_t - f128M_to_i32( const float128_t *aPtr, uint_fast8_t roundingMode, bool exact ); +int32_t f64_to_i32( float64_t a, uint_fast8_t roundingMode, bool exact );++int32_t + f128M_to_i32( const float128_t *aPtr, uint_fast8_t roundingMode, bool exact ); ++roundingMode
argument specifies the rounding mode for the conversion. The variable that usually indicates rounding mode, @@ -768,12 +767,14 @@ and convenience: These functions round only toward zero (to minimum magnitude). The signatures for these functions are the same as above without the redundantroundingMode
argument: +- int32_t f64_to_i32_r_minMag( float64_t a, bool exact ); +int32_t f64_to_i32_r_minMag( float64_t a, bool exact );- int32_t f128M_to_i32_r_minMag( const float128_t *aPtr, bool exact ); +int32_t f128M_to_i32_r_minMag( const float128_t *aPtr, bool exact );+8.3. Conversions Among Floating-Point Types
@@ -789,18 +790,20 @@ result are different formats. There are four different styles of signature for these functions, depending on whether the input and the output floating-point values are passed by value or via pointers: +- float32_t f64_to_f32( float64_t a ); +float32_t f64_to_f32( float64_t a );- float32_t f128M_to_f32( const float128_t *aPtr ); +float32_t f128M_to_f32( const float128_t *aPtr );- void f32_to_f128M( float32_t a, float128_t *destPtr ); +void f32_to_f128M( float32_t a, float128_t *destPtr );- void extF80M_to_f128M( const extFloat80_t *aPtr, float128_t *destPtr ); +void extF80M_to_f128M( const extFloat80_t *aPtr, float128_t *destPtr );+@@ -823,22 +826,22 @@ Each floating-point operation takes two operands, except for
sqrt
(square root) which takes only one. The operands and result are all of the same floating-point format. Signatures for these functions take the following forms: +When floating-point values are passed indirectly through pointers, arguments- float64_t f64_add( float64_t a, float64_t b ); +float64_t f64_add( float64_t a, float64_t b );- void - f128M_add( - const float128_t *aPtr, const float128_t *bPtr, float128_t *destPtr ); -- --
- float64_t f64_sqrt( float64_t a ); +void + f128M_add( + const float128_t *aPtr, const float128_t *bPtr, float128_t *destPtr );- void f128M_sqrt( const float128_t *aPtr, float128_t *destPtr ); +float64_t f64_sqrt( float64_t a );++void f128M_sqrt( const float128_t *aPtr, float128_t *destPtr ); ++aPtr
andbPtr
point to the input operands, and the last argument,destPtr
, points to the @@ -850,7 +853,7 @@ Rounding of the80-bit double-extended-precision (extFloat80_t
) functions is affected by variableextF80_roundingPrecision
, as explained earlier insection 6.3 , -Rounding Precision for80-Bit Extended Format. +Rounding Precision for the80-Bit Extended Format.8.5. Fused Multiply-Add Functions
@@ -873,18 +876,20 @@ No fused multiple-add function is currently provided for theDepending on whether floating-point values are passed by value or via pointers, the fused multiply-add functions have signatures of these forms: +
The functions compute- float64_t f64_mulAdd( float64_t a, float64_t b, float64_t c ); +float64_t f64_mulAdd( float64_t a, float64_t b, float64_t c );- void - f128M_mulAdd( - const float128_t *aPtr, - const float128_t *bPtr, - const float128_t *cPtr, - float128_t *destPtr - ); +void + f128M_mulAdd( + const float128_t *aPtr, + const float128_t *bPtr, + const float128_t *cPtr, + float128_t *destPtr + );+( @@ -915,14 +920,16 @@ Each remainder operation takes two floating-point operands of the same format and returns a result in the same format. Depending on whether floating-point values are passed by value or via pointers, the remainder functions have signatures of these forms: +a
×b
) +c
When floating-point values are passed indirectly through pointers, arguments- float64_t f64_rem( float64_t a, float64_t b ); +float64_t f64_rem( float64_t a, float64_t b );- void - f128M_rem( - const float128_t *aPtr, const float128_t *bPtr, float128_t *destPtr ); +void + f128M_rem( + const float128_t *aPtr, const float128_t *bPtr, float128_t *destPtr );+aPtr
andbPtr
point to operandsa
andb
respectively, and @@ -938,8 +945,8 @@ where n is the integer closest to Ifis exactly halfway between two integers, n is the even integer closest to a
÷b
. -The IEEE Standard's remainder operation is always exact and so requires no -rounding. +The IEEE Standard’s remainder operation is always exact and so requires +no rounding. a
÷b
@@ -968,18 +975,20 @@ and the resulting integer value is returned in the same floating-point format.
The signatures of the round-to-integer functions are similar to those for conversions to an integer type: +
The- float64_t f64_roundToInt( float64_t a, uint_fast8_t roundingMode, bool exact ); +float64_t f64_roundToInt( float64_t a, uint_fast8_t roundingMode, bool exact );- void - f128M_roundToInt( - const float128_t *aPtr, - uint_fast8_t roundingMode, - bool exact, - float128_t *destPtr - ); +void + f128M_roundToInt( + const float128_t *aPtr, + uint_fast8_t roundingMode, + bool exact, + float128_t *destPtr + );+roundingMode
argument specifies the rounding mode to apply. The variable that usually indicates rounding mode, @@ -1005,17 +1014,19 @@ provided:<float>_lt
Each comparison takes two operands of the same type and returns a Boolean. -The abbreviationeq
stands for ``equal'' (=); -le
stands for ``less than or equal'' (≤); -andlt
stands for ``less than'' (<). +The abbreviationeq
stands for “equal” (=); +le
stands for “less than or equal” (≤); +andlt
stands for “less than” (<). Depending on whether the floating-point operands are passed by value or via pointers, the comparison functions have signatures of these forms: +- bool f64_eq( float64_t a, float64_t b ); +bool f64_eq( float64_t a, float64_t b );- bool f128M_eq( const float128_t *aPtr, const float128_t *bPtr ); +bool f128M_eq( const float128_t *aPtr, const float128_t *bPtr );+@@ -1058,21 +1069,25 @@ provided with these names: The functions take one floating-point operand and return a Boolean indicating whether the operand is a signaling NaN. Accordingly, the functions have the forms +
- bool f64_isSignalingNaN( float64_t a ); +bool f64_isSignalingNaN( float64_t a );- bool f128M_isSignalingNaN( const float128_t *aPtr ); +bool f128M_isSignalingNaN( const float128_t *aPtr );+8.10. Raise-Exception Function
SoftFloat provides a single function for raising floating-point exceptions: +
The- void softfloat_raise( uint_fast8_t exceptions ); +void softfloat_raise( uint_fast8_t exceptions );+exceptions
argument is a mask indicating the set of exceptions to raise. (See earlier section 7, Exceptions and Exception Flags.) @@ -1084,6 +1099,11 @@ function may cause a trap or abort appropriate for the current system.9. Changes from SoftFloat
+Release 2 +Apart from the change in the legal use license, there are numerous technical +differences between
+Release 3 of SoftFloat and earlier releases. +9.1. Name Changes
@@ -1214,17 +1234,17 @@ Lastly, there are a few other changes to function names:
_round_to_zero
- _r_minMag
conversions from floating-point to integer, section 8.2 +conversions from floating-point to integer ( section 8.2 )round_to_int
- roundToInt
round-to-integer functions, section 8.7 +round-to-integer functions ( section 8.7 )is_signaling_nan
- isSignalingNaN
signaling NaN test functions, section 8.9 +signaling NaN test functions ( section 8.9 )exact
.With
Release 3 , a port of SoftFloat can now define any of the floating-point typesfloat32_t
,float64_t
, -extFloat80_t
, andfloat128_t
as aliases for C's +extFloat80_t
, andfloat128_t
as aliases for C’s standard floating-point typesfloat
,double
, andlong
double
, using either#define
ortypedef
. @@ -1304,9 +1324,9 @@ This potential convenience was not supported underRelease 2 .-(Note, however, that there may be a performance cost to defining SoftFloat's -floating-point types this way, depending on the platform and the applications -using SoftFloat. +(Note, however, that there may be a performance cost to defining +SoftFloat’s floating-point types this way, depending on the platform and +the applications using SoftFloat. Ports of SoftFloat may choose to forgo the convenience in favor of better speed.)
@@ -1338,7 +1358,7 @@ Fused multiply-add functions have been added for the non-extended formats,
Release 3 of SoftFloat is written to conform better to the ISO C -Standard's rules for portability. +Standard’s rules for portability. For example, older releases of SoftFloat employed type conversions in ways that, while commonly practiced, are not fully defined by the C Standard. Such problematic type conversions have generally been replaced by the use of @@ -1387,8 +1407,8 @@ Some loss of speed has been observed due to this change. The following improvements are anticipated for future releases of SoftFloat:
- -support for the common
16-bit ``half-precision'' floating-point -format; +support for the common16-bit “half-precision” +floating-point format;- more functions from the 2008 version of the IEEE Floating-Point Standard;