Finalized documentation for SoftFloat Release 3.
This commit is contained in:
parent
437d9b9fb2
commit
7276b0022e
|
@ -2,7 +2,7 @@
|
|||
License for Berkeley SoftFloat Release 3
|
||||
|
||||
John R. Hauser
|
||||
2014 ________
|
||||
2014 Dec 17
|
||||
|
||||
The following applies to the whole of SoftFloat Release 3 as well as to each
|
||||
source file individually.
|
||||
|
|
|
@ -11,7 +11,7 @@
|
|||
|
||||
<P>
|
||||
John R. Hauser<BR>
|
||||
2014 ________<BR>
|
||||
2014 Dec 17<BR>
|
||||
</P>
|
||||
|
||||
<P>
|
||||
|
@ -19,7 +19,8 @@ Berkeley SoftFloat is a software implementation of binary floating-point that
|
|||
conforms to the IEEE Standard for Floating-Point Arithmetic.
|
||||
SoftFloat is distributed in the form of C source code.
|
||||
Building the SoftFloat sources generates a library file (typically
|
||||
<CODE>"softfloat.a"</CODE>) containing the floating-point subroutines.
|
||||
<CODE>softfloat.a</CODE> or <CODE>libsoftfloat.a</CODE>) containing the
|
||||
floating-point subroutines.
|
||||
</P>
|
||||
|
||||
<P>
|
||||
|
|
|
@ -2,13 +2,13 @@
|
|||
Package Overview for Berkeley SoftFloat Release 3
|
||||
|
||||
John R. Hauser
|
||||
2014 ________
|
||||
2014 Dec 17
|
||||
|
||||
Berkeley SoftFloat is a software implementation of binary floating-point
|
||||
that conforms to the IEEE Standard for Floating-Point Arithmetic. SoftFloat
|
||||
is distributed in the form of C source code. Building the SoftFloat sources
|
||||
generates a library file (typically "softfloat.a") containing the floating-
|
||||
point subroutines.
|
||||
generates a library file (typically "softfloat.a" or "libsoftfloat.a")
|
||||
containing the floating-point subroutines.
|
||||
|
||||
The SoftFloat package is documented in the following files in the "doc"
|
||||
subdirectory:
|
||||
|
|
|
@ -11,11 +11,7 @@
|
|||
|
||||
<P>
|
||||
John R. Hauser<BR>
|
||||
2014 _____<BR>
|
||||
</P>
|
||||
|
||||
<P>
|
||||
*** CONTENT DONE.
|
||||
2014 Dec 17<BR>
|
||||
</P>
|
||||
|
||||
|
||||
|
@ -24,7 +20,8 @@ John R. Hauser<BR>
|
|||
<UL>
|
||||
|
||||
<LI>
|
||||
Complete rewrite, funded by the University of California, Berkeley.
|
||||
Complete rewrite, funded by the University of California, Berkeley, and
|
||||
consequently having a different use license than earlier releases.
|
||||
Major changes included renaming most types and functions, upgrading some
|
||||
algorithms, restructuring the source files, and making SoftFloat into a true
|
||||
library.
|
||||
|
@ -54,8 +51,9 @@ TestFloat package).
|
|||
<UL>
|
||||
|
||||
<LI>
|
||||
Further improved wording for the legal restrictions on using SoftFloat releases
|
||||
<NOBR>through 2c</NOBR>.
|
||||
Further improved the wording for the legal restrictions on using SoftFloat
|
||||
releases <NOBR>through 2c</NOBR> (not applicable to <NOBR>Release 3</NOBR> or
|
||||
later).
|
||||
|
||||
</UL>
|
||||
|
||||
|
@ -134,7 +132,8 @@ tininess is detected before or after rounding.
|
|||
<UL>
|
||||
|
||||
<LI>
|
||||
Original release.
|
||||
Original release, based on work done for the International Computer Science
|
||||
Institute (ICSI) in Berkely, California.
|
||||
|
||||
</UL>
|
||||
|
||||
|
|
|
@ -11,36 +11,39 @@
|
|||
|
||||
<P>
|
||||
John R. Hauser<BR>
|
||||
2014 _____<BR>
|
||||
</P>
|
||||
|
||||
<P>
|
||||
*** REPLACE QUOTATION MARKS.
|
||||
2014 Dec 17<BR>
|
||||
</P>
|
||||
|
||||
|
||||
<H2>Contents</H2>
|
||||
|
||||
<P>
|
||||
*** CHECK.<BR>
|
||||
*** FIX FORMATTING.
|
||||
</P>
|
||||
|
||||
<PRE>
|
||||
Introduction
|
||||
Limitations
|
||||
Acknowledgments and License
|
||||
SoftFloat Package Directory Structure
|
||||
Issues for Porting SoftFloat to a New Target
|
||||
Standard Headers <stdbool.h> and <stdint.h>
|
||||
Specializing Floating-Point Behavior
|
||||
Macros for Build Options
|
||||
Adapting a Template Target Directory
|
||||
Target-Specific Optimization of Primitive Functions
|
||||
Testing SoftFloat
|
||||
Providing SoftFloat as a Common Library for Applications
|
||||
Contact Information
|
||||
</PRE>
|
||||
<BLOCKQUOTE>
|
||||
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
|
||||
<COL WIDTH=25>
|
||||
<COL WIDTH=*>
|
||||
<TR><TD COLSPAN=2>1. Introduction</TD></TR>
|
||||
<TR><TD COLSPAN=2>2. Limitations</TD></TR>
|
||||
<TR><TD COLSPAN=2>3. Acknowledgments and License</TD></TR>
|
||||
<TR><TD COLSPAN=2>4. SoftFloat Package Directory Structure</TD></TR>
|
||||
<TR><TD COLSPAN=2>5. Issues for Porting SoftFloat to a New Target</TD></TR>
|
||||
<TR>
|
||||
<TD></TD>
|
||||
<TD>5.1. Standard Headers <CODE><stdbool.h></CODE> and
|
||||
<CODE><stdint.h></CODE></TD>
|
||||
</TR>
|
||||
<TR><TD></TD><TD>5.2. Specializing Floating-Point Behavior</TD></TR>
|
||||
<TR><TD></TD><TD>5.3. Macros for Build Options</TD></TR>
|
||||
<TR><TD></TD><TD>5.4. Adapting a Template Target Directory</TD></TR>
|
||||
<TR>
|
||||
<TD></TD><TD>5.5. Target-Specific Optimization of Primitive Functions</TD>
|
||||
</TR>
|
||||
<TR><TD COLSPAN=2>6. Testing SoftFloat</TD></TR>
|
||||
<TR>
|
||||
<TD COLSPAN=2>7. Providing SoftFloat as a Common Library for Applications</TD>
|
||||
</TR>
|
||||
<TR><TD COLSPAN=2>8. Contact Information</TD></TR>
|
||||
</TABLE>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
|
||||
<H2>1. Introduction</H2>
|
||||
|
@ -98,7 +101,7 @@ strictly required.
|
|||
integer types.
|
||||
If these headers are not supplied with the C compiler, minimal substitutes must
|
||||
be provided.
|
||||
SoftFloat's dependence on these headers is detailed later in
|
||||
SoftFloat’s dependence on these headers is detailed later in
|
||||
<NOBR>section 5.1</NOBR>, <I>Standard Headers <stdbool.h> and
|
||||
<stdint.h></I>.
|
||||
</P>
|
||||
|
@ -110,15 +113,20 @@ SoftFloat's dependence on these headers is detailed later in
|
|||
The SoftFloat package was written by me, <NOBR>John R.</NOBR> Hauser.
|
||||
<NOBR>Release 3</NOBR> of SoftFloat is a completely new implementation
|
||||
supplanting earlier releases.
|
||||
This project was done in the employ of the University of California, Berkeley,
|
||||
within the Department of Electrical Engineering and Computer Sciences, first
|
||||
for the Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab.
|
||||
This project (<NOBR>Release 3</NOBR> only, not earlier releases) was done in
|
||||
the employ of the University of California, Berkeley, within the Department of
|
||||
Electrical Engineering and Computer Sciences, first for the Parallel Computing
|
||||
Laboratory (Par Lab) and then for the ASPIRE Lab.
|
||||
The work was officially overseen by Prof. Krste Asanovic, with funding provided
|
||||
by these sources:
|
||||
<BLOCKQUOTE>
|
||||
<TABLE>
|
||||
<COL WIDTH=*>
|
||||
<COL WIDTH=10>
|
||||
<COL WIDTH=*>
|
||||
<TR>
|
||||
<TD><NOBR>Par Lab:</NOBR></TD>
|
||||
<TD VALIGN=TOP><NOBR>Par Lab:</NOBR></TD>
|
||||
<TD></TD>
|
||||
<TD>
|
||||
Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery
|
||||
(Award #DIG07-10227), with additional support from Par Lab affiliates Nokia,
|
||||
|
@ -126,7 +134,8 @@ NVIDIA, Oracle, and Samsung.
|
|||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><NOBR>ASPIRE Lab:</NOBR></TD>
|
||||
<TD VALIGN=TOP><NOBR>ASPIRE Lab:</NOBR></TD>
|
||||
<TD></TD>
|
||||
<TD>
|
||||
DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from
|
||||
ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA,
|
||||
|
@ -185,27 +194,40 @@ Because SoftFloat is targeted to multiple platforms, its source code is
|
|||
slightly scattered between target-specific and target-independent directories
|
||||
and files.
|
||||
The supplied directory structure is as follows:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
doc
|
||||
source
|
||||
include
|
||||
8086
|
||||
8086-SSE
|
||||
build
|
||||
template-FAST_INT64
|
||||
template-not-FAST_INT64
|
||||
Linux-386-GCC
|
||||
Linux-386-SSE2-GCC
|
||||
Linux-x86_64-GCC
|
||||
Win32-MinGW
|
||||
Win32-SSE2-MinGW
|
||||
Win64-MinGW-w64
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
The majority of the SoftFloat sources are provided in the <CODE>source</CODE>
|
||||
directory.
|
||||
The <CODE>include</CODE> subdirectory of <CODE>source</CODE> contains several
|
||||
header files (unsurprisingly), while the <CODE>8086</CODE> subdirectory
|
||||
contains source files that specialize the floating-point behavior to match the
|
||||
Intel x86 line of processors.
|
||||
header files (unsurprisingly), while the <CODE>8086</CODE> and
|
||||
<NOBR><CODE>8086-SSE</CODE></NOBR> subdirectories contain source files that
|
||||
specialize the floating-point behavior to match the Intel x86 line of
|
||||
processors.
|
||||
The files in directory <CODE>8086</CODE> give floating-point behavior
|
||||
consistent solely with Intel’s older, 8087-derived floating-point, while
|
||||
those in <NOBR><CODE>8086-SSE</CODE></NOBR> update the behavior of the
|
||||
non-extended formats (<CODE>float32_t</CODE>, <CODE>float64_t</CODE>, and
|
||||
<CODE>float128_t</CODE>) to mirror Intel’s more recent Streaming SIMD
|
||||
Extensions (SSE) and other compatible extensions.
|
||||
If other specializations are attempted, these would be expected to be other
|
||||
subdirectories of <CODE>source</CODE> alongside <CODE>8086</CODE>.
|
||||
subdirectories of <CODE>source</CODE> alongside <CODE>8086</CODE> and
|
||||
<NOBR><CODE>8086-SSE</CODE></NOBR>.
|
||||
Specialization is covered later, in <NOBR>section 5.2</NOBR>, <I>Specializing
|
||||
Floating-Point Behavior</I>.
|
||||
</P>
|
||||
|
@ -213,9 +235,9 @@ Floating-Point Behavior</I>.
|
|||
<P>
|
||||
The <CODE>build</CODE> directory is intended to contain a subdirectory for each
|
||||
target platform for which a build of the SoftFloat library may be created.
|
||||
For each build target, the target's subdirectory is where all derived object
|
||||
files and the completed SoftFloat library (typically <CODE>softfloat.a</CODE>
|
||||
or <CODE>libsoftfloat.a</CODE>) are created.
|
||||
For each build target, the target’s subdirectory is where all derived
|
||||
object files and the completed SoftFloat library (typically
|
||||
<CODE>softfloat.a</CODE> or <CODE>libsoftfloat.a</CODE>) are created.
|
||||
The two <CODE>template</CODE> subdirectories are not actual build targets but
|
||||
contain sample files for creating new target directories.
|
||||
(The meaning of <CODE>FAST_INT64</CODE> will be explained later.)
|
||||
|
@ -227,18 +249,21 @@ are intended to follow a naming system of
|
|||
<NOBR><CODE><execution-environment>-<compiler></CODE></NOBR>.
|
||||
For the example targets,
|
||||
<NOBR><CODE><execution-environment></CODE></NOBR> is
|
||||
<NOBR><CODE>Linux-386</CODE></NOBR>, <NOBR><CODE>Linux-x86_64</CODE></NOBR>,
|
||||
<CODE>Win32</CODE>, or <CODE>Win64</CODE>, and
|
||||
<NOBR><CODE>Linux-386</CODE></NOBR>, <NOBR><CODE>Linux-386-SSE2</CODE></NOBR>,
|
||||
<NOBR><CODE>Linux-x86_64</CODE></NOBR>, <CODE>Win32</CODE>,
|
||||
<NOBR><CODE>Win32-SSE2</CODE></NOBR>, or <CODE>Win64</CODE>, and
|
||||
<NOBR><CODE><compiler></CODE></NOBR> is <CODE>GCC</CODE>,
|
||||
<CODE>MinGW</CODE>, or <NOBR><CODE>MinGW-w64</CODE></NOBR>.
|
||||
</P>
|
||||
|
||||
<P>
|
||||
As supplied, each target directory contains two files:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
Makefile
|
||||
platform.h
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
The provided <CODE>Makefile</CODE> is written for GNU <CODE>make</CODE>.
|
||||
A build of SoftFloat for the specific target is begun by executing the
|
||||
<CODE>make</CODE> command with the target directory as the current directory.
|
||||
|
@ -258,10 +283,10 @@ desirable to include in header <CODE>platform.h</CODE> (directly or via
|
|||
<CODE>#include</CODE>) declarations for numerous target-specific optimizations.
|
||||
Such possibilities are discussed in the next section, <I>Issues for Porting
|
||||
SoftFloat to a New Target</I>.
|
||||
If the target's compiler or library has bugs or other shortcomings, workarounds
|
||||
for these issues may also be possible with target-specific declarations in
|
||||
<CODE>platform.h</CODE>, avoiding the need to modify the main SoftFloat
|
||||
sources.
|
||||
If the target’s compiler or library has bugs or other shortcomings,
|
||||
workarounds for these issues may also be possible with target-specific
|
||||
declarations in <CODE>platform.h</CODE>, avoiding the need to modify the main
|
||||
SoftFloat sources.
|
||||
</P>
|
||||
|
||||
|
||||
|
@ -280,12 +305,15 @@ For older or nonstandard compilers, substitutes for
|
|||
<CODE><stdbool.h></CODE> and <CODE><stdint.h></CODE> may need to be
|
||||
created.
|
||||
SoftFloat depends on these names from <CODE><stdbool.h></CODE>:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
bool
|
||||
true
|
||||
false
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
and on these names from <CODE><stdint.h></CODE>:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
uint16_t
|
||||
uint32_t
|
||||
|
@ -304,6 +332,7 @@ and on these names from <CODE><stdint.h></CODE>:
|
|||
int_fast32_t
|
||||
int_fast64_t
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
</P>
|
||||
|
||||
|
||||
|
@ -312,12 +341,12 @@ and on these names from <CODE><stdint.h></CODE>:
|
|||
<P>
|
||||
The IEEE Floating-Point Standard allows for some flexibility in a conforming
|
||||
implementation, particularly concerning NaNs.
|
||||
The SoftFloat <CODE>source</CODE> directory is supplied with one or more
|
||||
The SoftFloat <CODE>source</CODE> directory is supplied with some
|
||||
<I>specialization</I> subdirectories containing possible definitions for this
|
||||
implementation-specific behavior.
|
||||
For example, the <CODE>8086</CODE> subdirectory has source files that
|
||||
specialize SoftFloat's behavior to match that of Intel's x86 line of
|
||||
processors.
|
||||
For example, the <CODE>8086</CODE> and <NOBR><CODE>8086-SSE</CODE></NOBR>
|
||||
subdirectories have source files that specialize SoftFloat’s behavior to
|
||||
match that of Intel’s x86 line of processors.
|
||||
The files in a specialization subdirectory must determine:
|
||||
<UL>
|
||||
<LI>
|
||||
|
@ -343,8 +372,9 @@ source files are needed to complete the specialization.
|
|||
</P>
|
||||
|
||||
<P>
|
||||
A new build target may use an existing specialization, such as the one provided
|
||||
by the <CODE>8086</CODE> subdirectory.
|
||||
A new build target may use an existing specialization, such as the ones
|
||||
provided by the <CODE>8086</CODE> and <NOBR><CODE>8086-SSE</CODE></NOBR>
|
||||
subdirectories.
|
||||
If a build target needs a new specialization, different from any existing ones,
|
||||
it is recommended that a new specialization subdirectory be created in the
|
||||
<CODE>source</CODE> directory for this purpose.
|
||||
|
@ -367,18 +397,18 @@ Must be defined for little-endian machines; must not be defined for big-endian
|
|||
machines.
|
||||
<DT><CODE>SOFTFLOAT_FAST_INT64</CODE>
|
||||
<DD>
|
||||
Can be defined to indicate that the build target's implementation of
|
||||
<CODE>64-bit</CODE> arithmetic is efficient.
|
||||
For newer <CODE>64-bit</CODE> processors, this macro should usually be defined.
|
||||
For very small microprocessors whose buses and registers are <CODE>8-bit</CODE>
|
||||
or <CODE>16-bit</CODE> in size, this macro should usually not be defined.
|
||||
Whether this macro should be defined for a <CODE>32-bit</CODE> processor may
|
||||
Can be defined to indicate that the build target’s implementation of
|
||||
<NOBR>64-bit</NOBR> arithmetic is efficient.
|
||||
For newer <NOBR>64-bit</NOBR> processors, this macro should usually be defined.
|
||||
For very small microprocessors whose buses and registers are <NOBR>8-bit</NOBR>
|
||||
or <NOBR>16-bit</NOBR> in size, this macro should usually not be defined.
|
||||
Whether this macro should be defined for a <NOBR>32-bit</NOBR> processor may
|
||||
depend on the target machine and the applications that will use SoftFloat.
|
||||
<DT><CODE>SOFTFLOAT_FAST_DIV64TO32</CODE>
|
||||
<DD>
|
||||
Can be defined to indicate that the target's division operator
|
||||
Can be defined to indicate that the target’s division operator
|
||||
<NOBR>in C</NOBR> (written as <CODE>/</CODE>) is reasonably efficient for
|
||||
dividing a <CODE>64-bit</CODE> unsigned integer by a <CODE>32-bit</CODE>
|
||||
dividing a <NOBR>64-bit</NOBR> unsigned integer by a <NOBR>32-bit</NOBR>
|
||||
unsigned integer.
|
||||
Setting this macro may affect the performance of division, remainder, and
|
||||
square root operations.
|
||||
|
@ -411,16 +441,16 @@ defined to <CODE>extern</CODE> <CODE>inline</CODE>.
|
|||
Following the usual custom <NOBR>for C</NOBR>, for the first three macros (all
|
||||
except <CODE>INLINE_LEVEL</CODE> and <CODE>INLINE</CODE>), the content of any
|
||||
definition is irrelevant;
|
||||
what matters is a macro's effect on <CODE>#ifdef</CODE> directives.
|
||||
what matters is a macro’s effect on <CODE>#ifdef</CODE> directives.
|
||||
</P>
|
||||
|
||||
<P>
|
||||
It is recommended that any definitions of macros <CODE>LITTLEENDIAN</CODE> and
|
||||
<CODE>INLINE</CODE> be made in a build target's <CODE>platform.h</CODE> header
|
||||
file, because these macros are expected to be determined inflexibly by the
|
||||
target machine and compiler.
|
||||
<CODE>INLINE</CODE> be made in a build target’s <CODE>platform.h</CODE>
|
||||
header file, because these macros are expected to be determined inflexibly by
|
||||
the target machine and compiler.
|
||||
The other three macros control optimization and might be better located in the
|
||||
target's Makefile (or its equivalent).
|
||||
target’s Makefile (or its equivalent).
|
||||
</P>
|
||||
|
||||
|
||||
|
@ -433,8 +463,9 @@ Two different templates exist because different functions are needed in the
|
|||
SoftFloat library depending on whether macro <CODE>SOFTFLOAT_FAST_INT64</CODE>
|
||||
is defined.
|
||||
If macro <CODE>SOFTFLOAT_FAST_INT64</CODE> will be defined,
|
||||
<CODE>template-FAST_INT64</CODE> is the template to use;
|
||||
otherwise, <CODE>template-not-FAST_INT64</CODE> is the appropriate template.
|
||||
<NOBR><CODE>template-FAST_INT64</CODE></NOBR> is the template to use;
|
||||
otherwise, <NOBR><CODE>template-not-FAST_INT64</CODE></NOBR> is the appropriate
|
||||
template.
|
||||
A new target directory can be created by copying the correct template directory
|
||||
and editing the files inside.
|
||||
To avoid confusion, it would be wise to refrain from editing the files within a
|
||||
|
@ -447,12 +478,12 @@ template directory directly.
|
|||
<P>
|
||||
Header file <CODE>primitives.h</CODE> (in directory
|
||||
<CODE>source/include</CODE>) declares macros and functions for numerous
|
||||
underlying arithmetic operations upon which many of SoftFloat's floating-point
|
||||
functions are ultimately built.
|
||||
underlying arithmetic operations upon which many of SoftFloat’s
|
||||
floating-point functions are ultimately built.
|
||||
The SoftFloat sources include implementations of all of these functions/macros,
|
||||
written as standard C code, so a complete and correct SoftFloat library can be
|
||||
built using only the supplied code for all functions.
|
||||
However, for many targets, SoftFloat's performance can be improved by
|
||||
However, for many targets, SoftFloat’s performance can be improved by
|
||||
substituting target-specific implementations of some of the functions/macros
|
||||
declared in <CODE>primitives.h</CODE>.
|
||||
</P>
|
||||
|
@ -461,7 +492,7 @@ declared in <CODE>primitives.h</CODE>.
|
|||
For example, <CODE>primitives.h</CODE> declares a function called
|
||||
<CODE>softfloat_countLeadingZeros32</CODE> that takes an unsigned
|
||||
<NOBR>32-bit</NOBR> integer as an argument and returns the maximal number of
|
||||
the integer's most-significant bits that are all zeros.
|
||||
the integer’s most-significant bits that are all zeros.
|
||||
While the SoftFloat sources include an implementation of this function written
|
||||
in <NOBR>standard C</NOBR>, many processors can perform this same function
|
||||
directly in only one or two machine instructions.
|
||||
|
@ -473,19 +504,22 @@ package.
|
|||
<P>
|
||||
A build target can replace the supplied version of any function or macro of
|
||||
<CODE>primitives.h</CODE> by defining a macro with the same name in the
|
||||
target's <CODE>platform.h</CODE> header file.
|
||||
target’s <CODE>platform.h</CODE> header file.
|
||||
For this purpose, it may be helpful for <CODE>platform.h</CODE> to
|
||||
<CODE>#include</CODE> header file <CODE>primitiveTypes.h</CODE>, which defines
|
||||
types used for arguments and results of functions declared in
|
||||
<CODE>primitives.h</CODE>.
|
||||
When a desired replacement implementation is a function, not a macro, it is
|
||||
sufficient for <CODE>platform.h</CODE> to include the line
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
#define <function-name> <function-name>
|
||||
</PRE>
|
||||
where <CODE><function-name></CODE> is the name of the function.
|
||||
This technically defines <CODE><function-name></CODE> as a macro, but one
|
||||
that resolves to the same name, which may then be a function.
|
||||
</BLOCKQUOTE>
|
||||
where <NOBR><CODE><function-name></CODE></NOBR> is the name of the
|
||||
function.
|
||||
This technically defines <NOBR><CODE><function-name></CODE></NOBR> as a
|
||||
macro, but one that resolves to the same name, which may then be a function.
|
||||
(A preprocessor conforming to the C Standard must limit recursive macro
|
||||
expansion from being applied more than once.)
|
||||
</P>
|
||||
|
@ -500,46 +534,34 @@ This program is part of the Berkeley TestFloat package available at the Web
|
|||
page
|
||||
<A HREF="http://www.jhauser.us/arithmetic/TestFloat.html"><CODE>http://www.jhauser.us/arithmetic/TestFloat.html</CODE></A>.
|
||||
The TestFloat package also has a program called <CODE>timesoftfloat</CODE> that
|
||||
measures the speed of SoftFloat's floating-point functions.
|
||||
measures the speed of SoftFloat’s floating-point functions.
|
||||
</P>
|
||||
|
||||
|
||||
<H2>7. Providing SoftFloat as a Common Library for Applications</H2>
|
||||
|
||||
<P>
|
||||
Supplied <CODE>softfloat.h</CODE> depends on <CODE>softfloat_types.h</CODE>.
|
||||
Header file <CODE>softfloat.h</CODE> defines the SoftFloat interface as seen by
|
||||
clients.
|
||||
If the SoftFloat library will be made a common library for programs on a
|
||||
particular system, the supplied <CODE>softfloat.h</CODE> has a couple of
|
||||
deficiencies for this purpose:
|
||||
<UL>
|
||||
<LI>
|
||||
As supplied, <CODE>softfloat.h</CODE> depends on another header,
|
||||
<CODE>softfloat_types.h</CODE>, that is not intended for public use but which
|
||||
must also be visible to the programmer’s compiler.
|
||||
<LI>
|
||||
More troubling, at the time <CODE>softfloat.h</CODE> is included in a C
|
||||
source file, macro <CODE>SOFTFLOAT_FAST_INT64</CODE> must be defined, or not
|
||||
defined, consistent with whether this macro was defined when the SoftFloat
|
||||
library was built.
|
||||
</UL>
|
||||
In the situation that new programs may regularly <CODE>#include</CODE> header
|
||||
file <CODE>softfloat.h</CODE>, it is recommended that a custom, self-contained
|
||||
version of this header file be created that eliminates these issues.
|
||||
</P>
|
||||
|
||||
<PRE>
|
||||
The target-specific `softfloat.h' header file defines the SoftFloat
|
||||
interface as seen by clients.
|
||||
|
||||
Unlike the actual function definitions in `softfloat.c', the declarations
|
||||
in `softfloat.h' do not use any of the types defined by the `processors'
|
||||
header file. This is done so that clients will not have to include the
|
||||
`processors' header file in order to use SoftFloat. Nevertheless, the
|
||||
target-specific declarations in `softfloat.h' must match what `softfloat.c'
|
||||
expects. For example, if `int32' is defined as `int' in the `processors'
|
||||
header file, then in `softfloat.h' the output of `float32_to_int32' should
|
||||
be stated as `int', although in `softfloat.c' it is given in target-
|
||||
independent form as `int32'.
|
||||
</PRE>
|
||||
|
||||
<PRE>
|
||||
*** HERE
|
||||
|
||||
Porting and/or compiling SoftFloat involves the following steps:
|
||||
|
||||
4. In the target-specific subdirectory, edit the files `softfloat-specialize'
|
||||
and `softfloat.h' to define the desired exception handling functions
|
||||
and mode control values. In the `softfloat.h' header file, ensure also
|
||||
that all declarations give the proper target-specific type (such as
|
||||
`int' or `long') corresponding to the target-independent type used in
|
||||
`softfloat.c' (such as `int32'). None of the type names declared in the
|
||||
`processors' header file should appear in `softfloat.h'.
|
||||
|
||||
</PRE>
|
||||
|
||||
|
||||
<H2>8. Contact Information</H2>
|
||||
|
||||
|
|
|
@ -11,66 +11,59 @@
|
|||
|
||||
<P>
|
||||
John R. Hauser<BR>
|
||||
2014 ______<BR>
|
||||
</P>
|
||||
|
||||
<P>
|
||||
*** CONTENT DONE.
|
||||
</P>
|
||||
|
||||
<P>
|
||||
*** REPLACE QUOTATION MARKS.
|
||||
<BR>
|
||||
*** REPLACE APOSTROPHES.
|
||||
<BR>
|
||||
*** REPLACE EM DASH.
|
||||
2014 Dec 17<BR>
|
||||
</P>
|
||||
|
||||
|
||||
<H2>Contents</H2>
|
||||
|
||||
<P>
|
||||
*** CHECK.<BR>
|
||||
*** FIX FORMATTING.
|
||||
</P>
|
||||
|
||||
<PRE>
|
||||
Introduction
|
||||
Limitations
|
||||
Acknowledgments and License
|
||||
Types and Functions
|
||||
Boolean and Integer Types
|
||||
Floating-Point Types
|
||||
Supported Floating-Point Functions
|
||||
Non-canonical Representations in extFloat80_t
|
||||
Conventions for Passing Arguments and Results
|
||||
Reserved Names
|
||||
Mode Variables
|
||||
Rounding Mode
|
||||
Underflow Detection
|
||||
Rounding Precision for 80-Bit Extended Format
|
||||
Exceptions and Exception Flags
|
||||
Function Details
|
||||
Conversions from Integer to Floating-Point
|
||||
Conversions from Floating-Point to Integer
|
||||
Conversions Among Floating-Point Types
|
||||
Basic Arithmetic Functions
|
||||
Fused Multiply-Add Functions
|
||||
Remainder Functions
|
||||
Round-to-Integer Functions
|
||||
Comparison Functions
|
||||
Signaling NaN Test Functions
|
||||
Raise-Exception Function
|
||||
Changes from SoftFloat Release 2
|
||||
Name Changes
|
||||
Changes to Function Arguments
|
||||
Added Capabilities
|
||||
Better Compatibility with the C Language
|
||||
New Organization as a Library
|
||||
Optimization Gains (and Losses)
|
||||
Future Directions
|
||||
Contact Information
|
||||
</PRE>
|
||||
<BLOCKQUOTE>
|
||||
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
|
||||
<COL WIDTH=25>
|
||||
<COL WIDTH=*>
|
||||
<TR><TD COLSPAN=2>1. Introduction</TD></TR>
|
||||
<TR><TD COLSPAN=2>2. Limitations</TD></TR>
|
||||
<TR><TD COLSPAN=2>3. Acknowledgments and License</TD></TR>
|
||||
<TR><TD COLSPAN=2>4. Types and Functions</TD></TR>
|
||||
<TR><TD></TD><TD>4.1. Boolean and Integer Types</TD></TR>
|
||||
<TR><TD></TD><TD>4.2. Floating-Point Types</TD></TR>
|
||||
<TR><TD></TD><TD>4.3. Supported Floating-Point Functions</TD></TR>
|
||||
<TR>
|
||||
<TD></TD>
|
||||
<TD>4.4. Non-canonical Representations in <CODE>extFloat80_t</CODE></TD>
|
||||
</TR>
|
||||
<TR><TD></TD><TD>4.5. Conventions for Passing Arguments and Results</TD></TR>
|
||||
<TR><TD COLSPAN=2>5. Reserved Names</TD></TR>
|
||||
<TR><TD COLSPAN=2>6. Mode Variables</TD></TR>
|
||||
<TR><TD></TD><TD>6.1. Rounding Mode</TD></TR>
|
||||
<TR><TD></TD><TD>6.2. Underflow Detection</TD></TR>
|
||||
<TR>
|
||||
<TD></TD>
|
||||
<TD>6.3. Rounding Precision for the <NOBR>80-Bit</NOBR> Extended Format</TD>
|
||||
</TR>
|
||||
<TR><TD COLSPAN=2>7. Exceptions and Exception Flags</TD></TR>
|
||||
<TR><TD COLSPAN=2>8. Function Details</TD></TR>
|
||||
<TR><TD></TD><TD>8.1. Conversions from Integer to Floating-Point</TD></TR>
|
||||
<TR><TD></TD><TD>8.2. Conversions from Floating-Point to Integer</TD></TR>
|
||||
<TR><TD></TD><TD>8.3. Conversions Among Floating-Point Types</TD></TR>
|
||||
<TR><TD></TD><TD>8.4. Basic Arithmetic Functions</TD></TR>
|
||||
<TR><TD></TD><TD>8.5. Fused Multiply-Add Functions</TD></TR>
|
||||
<TR><TD></TD><TD>8.6. Remainder Functions</TD></TR>
|
||||
<TR><TD></TD><TD>8.7. Round-to-Integer Functions</TD></TR>
|
||||
<TR><TD></TD><TD>8.8. Comparison Functions</TD></TR>
|
||||
<TR><TD></TD><TD>8.9. Signaling NaN Test Functions</TD></TR>
|
||||
<TR><TD></TD><TD>8.10. Raise-Exception Function</TD></TR>
|
||||
<TR><TD COLSPAN=2>9. Changes from SoftFloat <NOBR>Release 2</NOBR></TD></TR>
|
||||
<TR><TD></TD><TD>9.1. Name Changes</TD></TR>
|
||||
<TR><TD></TD><TD>9.2. Changes to Function Arguments</TD></TR>
|
||||
<TR><TD></TD><TD>9.3. Added Capabilities</TD></TR>
|
||||
<TR><TD></TD><TD>9.4. Better Compatibility with the C Language</TD></TR>
|
||||
<TR><TD></TD><TD>9.5. New Organization as a Library</TD></TR>
|
||||
<TR><TD></TD><TD>9.6. Optimization Gains (and Losses)</TD></TR>
|
||||
<TR><TD COLSPAN=2>10. Future Directions</TD></TR>
|
||||
<TR><TD COLSPAN=2>11. Contact Information</TD></TR>
|
||||
</TABLE>
|
||||
</BLOCKQUOTE>
|
||||
|
||||
|
||||
<H2>1. Introduction</H2>
|
||||
|
@ -156,15 +149,20 @@ SoftFloat <NOBR>Release 3</NOBR>.
|
|||
The SoftFloat package was written by me, <NOBR>John R.</NOBR> Hauser.
|
||||
<NOBR>Release 3</NOBR> of SoftFloat is a completely new implementation
|
||||
supplanting earlier releases.
|
||||
This project was done in the employ of the University of California, Berkeley,
|
||||
within the Department of Electrical Engineering and Computer Sciences, first
|
||||
for the Parallel Computing Laboratory (Par Lab) and then for the ASPIRE Lab.
|
||||
This project (<NOBR>Release 3</NOBR> only, not earlier releases) was done in
|
||||
the employ of the University of California, Berkeley, within the Department of
|
||||
Electrical Engineering and Computer Sciences, first for the Parallel Computing
|
||||
Laboratory (Par Lab) and then for the ASPIRE Lab.
|
||||
The work was officially overseen by Prof. Krste Asanovic, with funding provided
|
||||
by these sources:
|
||||
<BLOCKQUOTE>
|
||||
<TABLE>
|
||||
<COL WIDTH=*>
|
||||
<COL WIDTH=10>
|
||||
<COL WIDTH=*>
|
||||
<TR>
|
||||
<TD><NOBR>Par Lab:</NOBR></TD>
|
||||
<TD VALIGN=TOP><NOBR>Par Lab:</NOBR></TD>
|
||||
<TD></TD>
|
||||
<TD>
|
||||
Microsoft (Award #024263), Intel (Award #024894), and U.C. Discovery
|
||||
(Award #DIG07-10227), with additional support from Par Lab affiliates Nokia,
|
||||
|
@ -172,7 +170,8 @@ NVIDIA, Oracle, and Samsung.
|
|||
</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><NOBR>ASPIRE Lab:</NOBR></TD>
|
||||
<TD VALIGN=TOP><NOBR>ASPIRE Lab:</NOBR></TD>
|
||||
<TD></TD>
|
||||
<TD>
|
||||
DARPA PERFECT program (Award #HR0011-12-2-0016), with additional support from
|
||||
ASPIRE industrial sponsor Intel and ASPIRE affiliates Google, Nokia, NVIDIA,
|
||||
|
@ -245,6 +244,7 @@ for these headers.
|
|||
Header <CODE>softfloat.h</CODE> depends only on the name <CODE>bool</CODE> from
|
||||
<CODE><stdbool.h></CODE> and on these type names from
|
||||
<CODE><stdint.h></CODE>:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
uint16_t
|
||||
uint32_t
|
||||
|
@ -255,6 +255,7 @@ Header <CODE>softfloat.h</CODE> depends only on the name <CODE>bool</CODE> from
|
|||
uint_fast32_t
|
||||
uint_fast64_t
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
</P>
|
||||
|
||||
|
||||
|
@ -263,26 +264,22 @@ Header <CODE>softfloat.h</CODE> depends only on the name <CODE>bool</CODE> from
|
|||
<P>
|
||||
The <CODE>softfloat.h</CODE> header defines four floating-point types:
|
||||
<BLOCKQUOTE>
|
||||
<TABLE>
|
||||
<TABLE CELLSPACING=0 CELLPADDING=0>
|
||||
<TR>
|
||||
<TD><CODE>float32_t</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD><NOBR>32-bit</NOBR> single-precision binary format</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>float64_t</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD><NOBR>64-bit</NOBR> double-precision binary format</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>extFloat80_t</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD><CODE>extFloat80_t </CODE></TD>
|
||||
<TD><NOBR>80-bit</NOBR> double-extended-precision binary format (old Intel or
|
||||
Motorola format)</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>float128_t</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD><NOBR>128-bit</NOBR> quadruple-precision binary format</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
|
@ -304,10 +301,10 @@ Header file <CODE>softfloat.h</CODE> also defines a structure,
|
|||
This structure is the same size as type <CODE>extFloat80_t</CODE> and contains
|
||||
at least these two fields (not necessarily in this order):
|
||||
<BLOCKQUOTE>
|
||||
<TABLE>
|
||||
<TR><TD><CODE>uint16_t signExp;</CODE></TD></TR>
|
||||
<TR><TD><CODE>uint64_t signif;</CODE></TD></TR>
|
||||
</TABLE>
|
||||
<PRE>
|
||||
uint16_t signExp;
|
||||
uint64_t signif;
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
Field <CODE>signExp</CODE> contains the sign and exponent of the floating-point
|
||||
value, with the sign in the most significant bit (<NOBR>bit 15</NOBR>) and the
|
||||
|
@ -339,8 +336,8 @@ operation defined by the IEEE Standard;
|
|||
for each format, the floating-point remainder operation defined by the IEEE
|
||||
Standard;
|
||||
<LI>
|
||||
for each format, a ``round to integer'' operation that rounds to the nearest
|
||||
integer value in the same format; and
|
||||
for each format, a “round to integer” operation that rounds to the
|
||||
nearest integer value in the same format; and
|
||||
<LI>
|
||||
comparisons between two values in the same floating-point format.
|
||||
</UL>
|
||||
|
@ -357,12 +354,12 @@ not supported in SoftFloat <NOBR>Release 3</NOBR>:
|
|||
conversions between floating-point formats and decimal or hexadecimal character
|
||||
sequences;
|
||||
<LI>
|
||||
all ``quiet-computation'' operations (<B>copy</B>, <B>negate</B>, <B>abs</B>,
|
||||
and <B>copySign</B>, which all involve only simple copying and/or manipulation
|
||||
of the floating-point sign bit); and
|
||||
all “quiet-computation” operations (<B>copy</B>, <B>negate</B>,
|
||||
<B>abs</B>, and <B>copySign</B>, which all involve only simple copying and/or
|
||||
manipulation of the floating-point sign bit); and
|
||||
<LI>
|
||||
all ``non-computational'' operations other than <B>isSignaling</B> (which is
|
||||
supported).
|
||||
all “non-computational” operations other than <B>isSignaling</B>
|
||||
(which is supported).
|
||||
</UL>
|
||||
</P>
|
||||
|
||||
|
@ -393,9 +390,9 @@ leading significand bit must <NOBR>be 1</NOBR> unless it is required to
|
|||
For <NOBR>Release 3</NOBR> of SoftFloat, functions are not guaranteed to
|
||||
operate as expected when inputs of type <CODE>extFloat80_t</CODE> are
|
||||
non-canonical.
|
||||
Assuming all of a function's <CODE>extFloat80_t</CODE> inputs (if any) are
|
||||
canonical, function outputs of type <CODE>extFloat80_t</CODE> will always be
|
||||
canonical.
|
||||
Assuming all of a function’s <CODE>extFloat80_t</CODE> inputs (if any)
|
||||
are canonical, function outputs of type <CODE>extFloat80_t</CODE> will always
|
||||
be canonical.
|
||||
</P>
|
||||
|
||||
<H3>4.5. Conventions for Passing Arguments and Results</H3>
|
||||
|
@ -426,8 +423,8 @@ SoftFloat supplies this function:
|
|||
The first two arguments point to the values to be added, and the last argument
|
||||
points to the location where the sum will be stored.
|
||||
The <CODE>M</CODE> in the name <CODE>f128M_add</CODE> is mnemonic for the fact
|
||||
that the <NOBR>128-bit</NOBR> inputs and outputs are ``in memory'', pointed to
|
||||
by pointer arguments.
|
||||
that the <NOBR>128-bit</NOBR> inputs and outputs are “in memory”,
|
||||
pointed to by pointer arguments.
|
||||
</P>
|
||||
|
||||
<P>
|
||||
|
@ -464,10 +461,11 @@ platforms of interest, programmers can use whichever version they prefer.
|
|||
<P>
|
||||
In addition to the variables and functions documented here, SoftFloat defines
|
||||
some symbol names for its own private use.
|
||||
These private names always begin with the prefix `<CODE>softfloat_</CODE>'.
|
||||
These private names always begin with the prefix
|
||||
‘<CODE>softfloat_</CODE>’.
|
||||
When a program includes header <CODE>softfloat.h</CODE> or links with the
|
||||
SoftFloat library, all names with prefix `<CODE>softfloat_</CODE>' are reserved
|
||||
for possible use by SoftFloat.
|
||||
SoftFloat library, all names with prefix ‘<CODE>softfloat_</CODE>’
|
||||
are reserved for possible use by SoftFloat.
|
||||
Applications that use SoftFloat should not define their own names with this
|
||||
prefix, and should reference only such names as are documented.
|
||||
</P>
|
||||
|
@ -477,7 +475,7 @@ prefix, and should reference only such names as are documented.
|
|||
|
||||
<P>
|
||||
The following variables control rounding mode, underflow detection, and the
|
||||
<NOBR>80-bit</NOBR> extended format's rounding precision:
|
||||
<NOBR>80-bit</NOBR> extended format’s rounding precision:
|
||||
<BLOCKQUOTE>
|
||||
<CODE>softfloat_roundingMode</CODE><BR>
|
||||
<CODE>softfloat_detectTininess</CODE><BR>
|
||||
|
@ -497,30 +495,25 @@ The rounding mode is selected by the global variable
|
|||
</BLOCKQUOTE>
|
||||
This variable may be set to one of the values
|
||||
<BLOCKQUOTE>
|
||||
<TABLE>
|
||||
<TABLE CELLSPACING=0 CELLPADDING=0>
|
||||
<TR>
|
||||
<TD><CODE>softfloat_round_near_even</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD>round to nearest, with ties to even</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>softfloat_round_near_maxMag</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD><CODE>softfloat_round_near_maxMag </CODE></TD>
|
||||
<TD>round to nearest, with ties to maximum magnitude (away from zero)</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>softfloat_round_minMag</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD>round to minimum magnitude (toward zero)</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>softfloat_round_min</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD>round to minimum (down)</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>softfloat_round_max</CODE></TD>
|
||||
<TD> </TD>
|
||||
<TD>round to maximum (up)</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
|
@ -550,7 +543,7 @@ Like most systems (and as required by the newer 2008 IEEE Standard), SoftFloat
|
|||
always detects loss of accuracy for underflow as an inexact result.
|
||||
</P>
|
||||
|
||||
<H3>6.3. Rounding Precision for 80-Bit Extended Format</H3>
|
||||
<H3>6.3. Rounding Precision for the <NOBR>80-Bit</NOBR> Extended Format</H3>
|
||||
|
||||
<P>
|
||||
For <CODE>extFloat80_t</CODE> only, the rounding precision of the basic
|
||||
|
@ -639,7 +632,7 @@ It does always raise the <I>inexact</I> exception flag as required.
|
|||
In this section, <CODE><<I>float</I>></CODE> appears in function names as
|
||||
a substitute for one of these abbreviations:
|
||||
<BLOCKQUOTE>
|
||||
<TABLE>
|
||||
<TABLE CELLSPACING=0 CELLPADDING=0>
|
||||
<TR>
|
||||
<TD><CODE>f32</CODE></TD>
|
||||
<TD>indicates <CODE>float32_t</CODE>, passed by value</TD>
|
||||
|
@ -696,11 +689,14 @@ Each conversion function takes one input of the appropriate type and generates
|
|||
one output.
|
||||
The following illustrates the signatures of these functions in cases when the
|
||||
floating-point result is passed either by value or via pointers:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
float64_t i32_to_f64( int32_t <I>a</I> );
|
||||
|
||||
</PRE>
|
||||
<PRE>
|
||||
void i32_to_f128M( int32_t <I>a</I>, float128_t *<I>destPtr</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
</P>
|
||||
|
||||
<H3>8.2. Conversions from Floating-Point to Integer</H3>
|
||||
|
@ -717,12 +713,15 @@ functions:
|
|||
</BLOCKQUOTE>
|
||||
The functions have signatures as follows, depending on whether the
|
||||
floating-point input is passed by value or via pointers:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
int32_t f64_to_i32( float64_t <I>a</I>, uint_fast8_t <I>roundingMode</I>, bool <I>exact</I> );
|
||||
|
||||
</PRE>
|
||||
<PRE>
|
||||
int32_t
|
||||
f128M_to_i32( const float128_t *<I>aPtr</I>, uint_fast8_t <I>roundingMode</I>, bool <I>exact</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
The <CODE><I>roundingMode</I></CODE> argument specifies the rounding mode for
|
||||
the conversion.
|
||||
The variable that usually indicates rounding mode,
|
||||
|
@ -768,12 +767,14 @@ and convenience:
|
|||
These functions round only toward zero (to minimum magnitude).
|
||||
The signatures for these functions are the same as above without the redundant
|
||||
<CODE><I>roundingMode</I></CODE> argument:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
int32_t f64_to_i32_r_minMag( float64_t <I>a</I>, bool <I>exact</I> );
|
||||
</PRE>
|
||||
<PRE>
|
||||
int32_t f128M_to_i32_r_minMag( const float128_t *<I>aPtr</I>, bool <I>exact</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
</P>
|
||||
|
||||
<H3>8.3. Conversions Among Floating-Point Types</H3>
|
||||
|
@ -789,6 +790,7 @@ result are different formats.
|
|||
There are four different styles of signature for these functions, depending on
|
||||
whether the input and the output floating-point values are passed by value or
|
||||
via pointers:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
float32_t f64_to_f32( float64_t <I>a</I> );
|
||||
</PRE>
|
||||
|
@ -801,6 +803,7 @@ via pointers:
|
|||
<PRE>
|
||||
void extF80M_to_f128M( const extFloat80_t *<I>aPtr</I>, float128_t *<I>destPtr</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
</P>
|
||||
|
||||
<P>
|
||||
|
@ -823,6 +826,7 @@ Each floating-point operation takes two operands, except for <CODE>sqrt</CODE>
|
|||
(square root) which takes only one.
|
||||
The operands and result are all of the same floating-point format.
|
||||
Signatures for these functions take the following forms:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
float64_t f64_add( float64_t <I>a</I>, float64_t <I>b</I> );
|
||||
</PRE>
|
||||
|
@ -831,14 +835,13 @@ Signatures for these functions take the following forms:
|
|||
f128M_add(
|
||||
const float128_t *<I>aPtr</I>, const float128_t *<I>bPtr</I>, float128_t *<I>destPtr</I> );
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
<PRE>
|
||||
float64_t f64_sqrt( float64_t <I>a</I> );
|
||||
</PRE>
|
||||
<PRE>
|
||||
void f128M_sqrt( const float128_t *<I>aPtr</I>, float128_t *<I>destPtr</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
When floating-point values are passed indirectly through pointers, arguments
|
||||
<CODE><I>aPtr</I></CODE> and <CODE><I>bPtr</I></CODE> point to the input
|
||||
operands, and the last argument, <CODE><I>destPtr</I></CODE>, points to the
|
||||
|
@ -850,7 +853,7 @@ Rounding of the <NOBR>80-bit</NOBR> double-extended-precision
|
|||
(<CODE>extFloat80_t</CODE>) functions is affected by variable
|
||||
<CODE>extF80_roundingPrecision</CODE>, as explained earlier in
|
||||
<NOBR>section 6.3</NOBR>,
|
||||
<I>Rounding Precision for <NOBR>80-Bit</NOBR> Extended Format</I>.
|
||||
<I>Rounding Precision for the <NOBR>80-Bit</NOBR> Extended Format</I>.
|
||||
</P>
|
||||
|
||||
<H3>8.5. Fused Multiply-Add Functions</H3>
|
||||
|
@ -873,6 +876,7 @@ No fused multiple-add function is currently provided for the
|
|||
<P>
|
||||
Depending on whether floating-point values are passed by value or via pointers,
|
||||
the fused multiply-add functions have signatures of these forms:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
float64_t f64_mulAdd( float64_t <I>a</I>, float64_t <I>b</I>, float64_t <I>c</I> );
|
||||
</PRE>
|
||||
|
@ -885,6 +889,7 @@ the fused multiply-add functions have signatures of these forms:
|
|||
float128_t *<I>destPtr</I>
|
||||
);
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
The functions compute
|
||||
<NOBR>(<CODE><I>a</I></CODE> × <CODE><I>b</I></CODE>)
|
||||
+ <CODE><I>c</I></CODE></NOBR>
|
||||
|
@ -915,6 +920,7 @@ Each remainder operation takes two floating-point operands of the same format
|
|||
and returns a result in the same format.
|
||||
Depending on whether floating-point values are passed by value or via pointers,
|
||||
the remainder functions have signatures of these forms:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
float64_t f64_rem( float64_t <I>a</I>, float64_t <I>b</I> );
|
||||
</PRE>
|
||||
|
@ -923,6 +929,7 @@ the remainder functions have signatures of these forms:
|
|||
f128M_rem(
|
||||
const float128_t *<I>aPtr</I>, const float128_t *<I>bPtr</I>, float128_t *<I>destPtr</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
When floating-point values are passed indirectly through pointers, arguments
|
||||
<CODE><I>aPtr</I></CODE> and <CODE><I>bPtr</I></CODE> point to operands
|
||||
<CODE><I>a</I></CODE> and <CODE><I>b</I></CODE> respectively, and
|
||||
|
@ -938,8 +945,8 @@ where <I>n</I> is the integer closest to
|
|||
If <NOBR><CODE><I>a</I></CODE> ÷ <CODE><I>b</I></CODE></NOBR> is exactly
|
||||
halfway between two integers, <I>n</I> is the <EM>even</EM> integer closest to
|
||||
<NOBR><CODE><I>a</I></CODE> ÷ <CODE><I>b</I></CODE></NOBR>.
|
||||
The IEEE Standard's remainder operation is always exact and so requires no
|
||||
rounding.
|
||||
The IEEE Standard’s remainder operation is always exact and so requires
|
||||
no rounding.
|
||||
</P>
|
||||
|
||||
<P>
|
||||
|
@ -968,6 +975,7 @@ and the resulting integer value is returned in the same floating-point format.
|
|||
<P>
|
||||
The signatures of the round-to-integer functions are similar to those for
|
||||
conversions to an integer type:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
float64_t f64_roundToInt( float64_t <I>a</I>, uint_fast8_t <I>roundingMode</I>, bool <I>exact</I> );
|
||||
</PRE>
|
||||
|
@ -980,6 +988,7 @@ conversions to an integer type:
|
|||
float128_t *<I>destPtr</I>
|
||||
);
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
The <CODE><I>roundingMode</I></CODE> argument specifies the rounding mode to
|
||||
apply.
|
||||
The variable that usually indicates rounding mode,
|
||||
|
@ -1005,17 +1014,19 @@ provided:
|
|||
<CODE><<I>float</I>>_lt</CODE>
|
||||
</BLOCKQUOTE>
|
||||
Each comparison takes two operands of the same type and returns a Boolean.
|
||||
The abbreviation <CODE>eq</CODE> stands for ``equal'' (=);
|
||||
<CODE>le</CODE> stands for ``less than or equal'' (≤);
|
||||
and <CODE>lt</CODE> stands for ``less than'' (<).
|
||||
The abbreviation <CODE>eq</CODE> stands for “equal” (=);
|
||||
<CODE>le</CODE> stands for “less than or equal” (≤);
|
||||
and <CODE>lt</CODE> stands for “less than” (<).
|
||||
Depending on whether the floating-point operands are passed by value or via
|
||||
pointers, the comparison functions have signatures of these forms:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
bool f64_eq( float64_t <I>a</I>, float64_t <I>b</I> );
|
||||
</PRE>
|
||||
<PRE>
|
||||
bool f128M_eq( const float128_t *<I>aPtr</I>, const float128_t *<I>bPtr</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
</P>
|
||||
|
||||
<P>
|
||||
|
@ -1058,21 +1069,25 @@ provided with these names:
|
|||
The functions take one floating-point operand and return a Boolean indicating
|
||||
whether the operand is a signaling NaN.
|
||||
Accordingly, the functions have the forms
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
bool f64_isSignalingNaN( float64_t <I>a</I> );
|
||||
</PRE>
|
||||
<PRE>
|
||||
bool f128M_isSignalingNaN( const float128_t *<I>aPtr</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
</P>
|
||||
|
||||
<H3>8.10. Raise-Exception Function</H3>
|
||||
|
||||
<P>
|
||||
SoftFloat provides a single function for raising floating-point exceptions:
|
||||
<BLOCKQUOTE>
|
||||
<PRE>
|
||||
void softfloat_raise( uint_fast8_t <I>exceptions</I> );
|
||||
</PRE>
|
||||
</BLOCKQUOTE>
|
||||
The <CODE><I>exceptions</I></CODE> argument is a mask indicating the set of
|
||||
exceptions to raise.
|
||||
(See earlier section 7, <I>Exceptions and Exception Flags</I>.)
|
||||
|
@ -1084,6 +1099,11 @@ function may cause a trap or abort appropriate for the current system.
|
|||
|
||||
<H2>9. Changes from SoftFloat <NOBR>Release 2</NOBR></H2>
|
||||
|
||||
<P>
|
||||
Apart from the change in the legal use license, there are numerous technical
|
||||
differences between <NOBR>Release 3</NOBR> of SoftFloat and earlier releases.
|
||||
</P>
|
||||
|
||||
<H3>9.1. Name Changes</H3>
|
||||
|
||||
<P>
|
||||
|
@ -1214,17 +1234,17 @@ Lastly, there are a few other changes to function names:
|
|||
<TR>
|
||||
<TD><CODE>_round_to_zero</CODE></TD>
|
||||
<TD><CODE>_r_minMag</CODE></TD>
|
||||
<TD>conversions from floating-point to integer, section 8.2</TD>
|
||||
<TD>conversions from floating-point to integer (<NOBR>section 8.2</NOBR>)</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>round_to_int</CODE></TD>
|
||||
<TD><CODE>roundToInt</CODE></TD>
|
||||
<TD>round-to-integer functions, section 8.7</TD>
|
||||
<TD>round-to-integer functions (<NOBR>section 8.7</NOBR>)</TD>
|
||||
</TR>
|
||||
<TR>
|
||||
<TD><CODE>is_signaling_nan </CODE></TD>
|
||||
<TD><CODE>isSignalingNaN</CODE></TD>
|
||||
<TD>signaling NaN test functions, section 8.9</TD>
|
||||
<TD>signaling NaN test functions (<NOBR>section 8.9</NOBR>)</TD>
|
||||
</TR>
|
||||
</TABLE>
|
||||
</BLOCKQUOTE>
|
||||
|
@ -1296,7 +1316,7 @@ argument <CODE><I>exact</I></CODE>.
|
|||
<P>
|
||||
With <NOBR>Release 3</NOBR>, a port of SoftFloat can now define any of the
|
||||
floating-point types <CODE>float32_t</CODE>, <CODE>float64_t</CODE>,
|
||||
<CODE>extFloat80_t</CODE>, and <CODE>float128_t</CODE> as aliases for C's
|
||||
<CODE>extFloat80_t</CODE>, and <CODE>float128_t</CODE> as aliases for C’s
|
||||
standard floating-point types <CODE>float</CODE>, <CODE>double</CODE>, and
|
||||
<CODE>long</CODE> <CODE>double</CODE>, using either <CODE>#define</CODE> or
|
||||
<CODE>typedef</CODE>.
|
||||
|
@ -1304,9 +1324,9 @@ This potential convenience was not supported under <NOBR>Release 2</NOBR>.
|
|||
</P>
|
||||
|
||||
<P>
|
||||
(Note, however, that there may be a performance cost to defining SoftFloat's
|
||||
floating-point types this way, depending on the platform and the applications
|
||||
using SoftFloat.
|
||||
(Note, however, that there may be a performance cost to defining
|
||||
SoftFloat’s floating-point types this way, depending on the platform and
|
||||
the applications using SoftFloat.
|
||||
Ports of SoftFloat may choose to forgo the convenience in favor of better
|
||||
speed.)
|
||||
</P>
|
||||
|
@ -1338,7 +1358,7 @@ Fused multiply-add functions have been added for the non-extended formats,
|
|||
|
||||
<P>
|
||||
<NOBR>Release 3</NOBR> of SoftFloat is written to conform better to the ISO C
|
||||
Standard's rules for portability.
|
||||
Standard’s rules for portability.
|
||||
For example, older releases of SoftFloat employed type conversions in ways
|
||||
that, while commonly practiced, are not fully defined by the C Standard.
|
||||
Such problematic type conversions have generally been replaced by the use of
|
||||
|
@ -1387,8 +1407,8 @@ Some loss of speed has been observed due to this change.
|
|||
The following improvements are anticipated for future releases of SoftFloat:
|
||||
<UL>
|
||||
<LI>
|
||||
support for the common <NOBR>16-bit</NOBR> ``half-precision'' floating-point
|
||||
format;
|
||||
support for the common <NOBR>16-bit</NOBR> “half-precision”
|
||||
floating-point format;
|
||||
<LI>
|
||||
more functions from the 2008 version of the IEEE Floating-Point Standard;
|
||||
<LI>
|
||||
|
|
Loading…
Reference in New Issue