Formatting output with Sather

Recent changes:

A (not so) new approach

Although nobody with experience thinks about it, using numbers as width delimiters (as in the sprintf function in C) is not as intuitive as it could be. The print using notation from BASIC was much more readable (at least unless the width is not larger than 20 or so). Furthermore, preparsing the format expression restricts the syntax of expressions for later data types.


General syntax

format expression -> "<" [selector] [options] padding expression [options] ">"
Options can be used by user defined classes to feature special print formats. User defined options should always start with a lower case letter. An exponent is an option.
selector -> positive integer ":"
padding expression -> [filling] padding | [sign] padding [precision padding]
filling -> F followed by any single character.
Fillings are not (yet) allowed with numbers.
sign -> "+" or "-".
padding -> arbitrary number of "#" ["^" arbitrary number of "#"].
precision padding -> "." or "," followed by an arbitrary number of "#".
Only numbers can have precisions.
Anchors and precision cannot be used together.


General options for basic classes

Padding

The formatting class is intended for specifying a concise format description for small, often and in varying styles printable objects. Between the angle brackets every hash character (#) stands for one character of the object to be printed. If an objects needs more than the specified number of characters to be printed, it will not be truncated but printed expanding over the boundary of the format.

Justifying using the anchor ^

One can justify to the left, to the right and to somewhere in the middle of the padding area by using the caret (^), when the object cannot be centered because it hits the border of the frame, it will be printed left- or right justified, depending which side it being hit first.

Fillers

The character to fill the padding area can be chosen separately. Filling is only supported for STR, BOOL and CHAR. Reason: The main application for fillers in numbers would be getting leading zeros. But they have a more complicated semantic (signs go before leading zeros,...). However, suggestions are thankfully welcome.

Selecting arguments

Without selectors the formatter uses the first format expression to format the first argument (behind the format string), the second format expression for the next argument and so forth.

To override the order of arguments one can use a selector within the format expression. The selector 3: asks the formatter to format the third argument to put it there. Consecutive format expression will use its following arguments. Selecting a non existing argument (either by using an invalid selector number or by using too many format expressions) will result in raising an exception.


Formatting basic classes

INT

When an integer does not fill the padding area, positions with zeros as padding will be printed as zeros (not yet implemented).

A + sign indicates the sign to be printed no matter whether the integer is positive or negative.

Although integers have no fraction, a period in the padding area will force the integer to be justified to the right of the period unless there is not enough space on the left.

FLT,FLTD

A formatting expression for floating point numbers has the following syntax:
<sign width precision exponent>
Each of these format fields can be left out. Anchors or zeros are not allowed.

The default is to print the number with its full precision and uses the exponential notation if the absolute value of the number is not between 10^-10 and 10^10.

sign
If the sign is a plus sign, the sign of the number is being printed even if it is a positive number.

The sign is part of the width specification insofar that it is a placeholder for one character in front of the dot.

width
The width consists of a number of hash signs (#) and specifies the minimal size of the integer part of the mantissa. A sign expands the width by one. If the width is specified the number will be printed in exponential notation if and only if the exponent format is specified.

Using width always implies using precision, which is zero if not specified.

precision
The precision consists of a dot (.) or a comma (,) optionally followed by a number of hash signs. It specifies the number of digits to use for the fractional part of the number. If a number has more fractional digits than specified the number will be rounded.

No dot and a dot followed by no hash signs differ insofar that both round the number towards the next integer but they print this number with or without dot.

exponent
The exponent forces the number to be printed in exponential notation. (This means the integer part of the number has always one digit!)

STR,BOOL,CHAR

Precision, padding with zeros and usage of signs is not supported.

CPX,CPXD

A compelx formatting expression can contain formatting information for real- and imaginary part. Real and imaginary part are separated by a semicolon to indicate another padding for the imaginary part. If there is no semicolon both parts will be formatted using the same format expression.

The formatting expresion can also be preceded by the option polar to indicate that the number is to be be printed in polar coordinates. The padding for the imaginary part is then used as a padding for the phase.

The output will look as follows:

Representation Result
Cartesian (default) real part "+" imaginary part "i"
Polar absolute value "*e^" phase "i"


Examples

The expressionproducesExplanation
#FMT("<> is not <>",
    1 , 1.3);
"1 is not 1.3"
The type of the arguments can be any subtype of $FMT.
#FMT("<> %< <>",
    3, 4);
"3 < 4"
To print an '<' one has to type '%<' in the format string.
#FMT("<2:> <1:>",
    3, 4);
"4 3"
Using a selector one can override the order of the arguments (Useful for multi-lingual error messages, for example.).
#FMT("<2:> <>",
    3, 4, 5);
"4 5"
The selector resets the counter selecting the arguments.
#FMT("%9.3f",7.4);
"    7.400"
The standard C notation is being supported at this time. The compiler produces a runtime error if a objects are being used together with letters not supposed to work with their types.
Formatting INTs
#FMT("<#####>",7);
"    7"
Format with width 5.
#FMT("<^+###>",7);
"+7   "
Format width 5, justify to the left. Print the sign no matter whether the number is positive or not.
#FMT("<##^##>",7);
"  7  "
Center in a field of width 5.
#FMT("<###.#>",7);
"  7  "
The integer will be printed to fit into a column with like FLTs.
#FMT("<hex>",245);
"F5"
Alternative bases for integers are hex, oct and bin.
#FMT("<bin-#####>",-12);
" -1100"
Negative numbers will not be printed in the binary complement.
Formatting STR, BOOL and CHAR
#FMT("<#####>","hi");
"   hi"
Format with width 5.
#FMT("<F*#####>","hi");
"***hi"
Fill with * instead of spaces.
#FMT("<####^####>",
	false);
"  false  "
Format with width 9 and center text around the middle.
#FMT("<#^#######>",
	"Hello");
"Hello    "
Centered around the 2nd pos the text would leave the padding field. So print left justified.
Formatting floating point numbers
#FMT("<#####.###>",
    3.14159);
"    3.142"
Use 5 character space for the integer part and print 3 fraction digits.
#FMT("<##,###>",
    3.14159);
" 3,142"
It is possible to use the comma instead of a dot.
#FMT("<##.###e+##>",
    3.14159);
" 3.142e+00"
2 digits before and 3 digits after the decimal dot, force exponential representation and force the plus sign on the exponent.
Formatting complex numbers
#FMT("<#####>",
    #CPX(0.0,1.0));
"    0+    1i"
Format real and imaginary part both with width 5.
#FMT("<#####;###>",
    #CPX(0.0,1.0));
"    0+4   i"
Format real part with <#####> and imaginary part with <###>.
#FMT("<polar###;.##>",
    #CPX(0.0,1.0));
"  1*e^1.57i"
Print in polar coordinates, format abs value using <###> and angle using <.##>.


Questions and answers

How does the anchor (^) work?

First of all, the anchor is not allowed when using floating point numbers.

The anchor specifies where the middle of an objects should be printed to. If the object consists of an even number of characters, the left side of the anchor will be preferred. If the object would exceed the width of the padding area the object will be printed left or right justified, depending on which side of the padding area would be expanded. Therefore an anchor at the very left of a padding field means the object will always be printed left justified.

#FMT("<##^####>","*")
"  *    "
#FMT("<##^####>","**")
" **    "
#FMT("<##^####>","***")
" ***   "
#FMT("<##^####>","****")
"****   "
#FMT("<##^####>","*****")
"*****  "
#FMT("<##^####>","******")
"****** "
#FMT("<##^####>","*********")
"*********"

If the object consists of more characters than the padding area, it will always be left justified. No anchor means right justified, except for floating point numbers, which will be justified according to their decimal dot.

How to make the format class printing other objects?

The format mechanism is constructed to be expandable as easy as possible. To have an own object being formatted, it just has to be a subtype of $FMT and to provide the routine fmt(str:STR):STR. When the object is being printed within the formatter class, the formatter will call this routine and pass the format expression to str. The result will be inserted into the result string.

To maintain readability of the format expressions, user defined classes should be written in the spirit of the formatting routines of the base classes.


Errors in format strings

If an error occurs while formatting, the error message will be retuned instead of the formetted object. Depending on the error, some of the objects in the formatting string might still be printed out properly.

If the global variable FMT_ERROR::raise_exceptions:BOOL is set to true an exception of type FMT_ERROR is being raised. This error consists of a more or less self documenting literal error message (stored in the attribute str) and an more concise error code (stored under error). The error codes are enumerated in FMT_ERROR_FLAGS and are intended for further analysis. The error message is the same one would get with raise_exception set to false (which is the default setting).

This is a short description of the error codes:

unexpected_end_format
The format string contains a format expression which is not properly terminated.
illegal_arg_number
The formatter found a format expression requesting an argument which does not exist. The can happen when using an invalid number in a selector or when there are too many format expressions in the format string. (Remember that the value of a selector has influence on consecutive format expressions.)
malformed_format
The format expression is not understood. This can either mean, that the format expression is malformed (two anchors, two dots, ...) or the the type of the argument does not expect this kind of format expression.
not_supported
The format expression requests a feature not yet supported.
Error codes of the C style formatter:
wrong_type
The type of the argument does not fit to the letter in the format expression.
sprintf_failed
The Sather runtime does not analyze the C style format expression in its full, it merely passes the expression to the C function sprintf. If this function fails for whatever reason this error is being raised.
bad_type
The type of the argument is not supported by the C style formatter although it is a subtype of $FMT.

How can I signal errors when doing own formatting?

Own error messages can easily be included into the error signalling scheme. Whenever one wants to signal an error, one just creates an error object with the correstponding error message and then returns the string representation of this error object.

Suppose you want to signal the error "Illegal format". The correct code for it would be:

   fmt(f:STR): STR
   is
     ...
     return #FMT_ERROR("Illegal format").str;
     ...
   end; -- fmt --

   #FMT( "<> <something_bad> <>", ob1, ob2, ob3 );
Depending on the status of raise_exceptions the formatter will insert the error message instead of the string representation of the object or raise a corresponding error. If formatting ob2 results in an error one still gets the correct values for ob1 and ob3.

Warnings:


Internal structure of the formatter

Whenever a format is being produces with the creator of fmt, the routine parse is called. This routine analyses the format string and finds the contained format expressions. Whenever a format expression is being discovered, the routine sformat takes a closer look at the format expression to find out which argument is to be printed here. (Right now C style format expression are detected as well and passed to cformat). sformat will call the routine fmt in this object passing the format expression as an argument.

As the formatting routines of the base classes are considered to be a uniform group of routines, they are collected in the class BASE_FORMAT. The fmt routines just call the corresponding formatting routines in this class.

User defined classes may take use of this class as well. However to keep ones program portable, the formatting routine of the user defined class should not be put into BASE_FORMAT.

Overview of the calls in the formatter class
FMT::create or FMT::format Recommended routines to use the formatter.
private
FMT::parse
Analyzing the format string and splitting it into format expressions.
Depending on style of format expression on of the following routines will be called:
Sather style formatting
private
FMT::sformat
Determines the correct object to be formatted.
private FMT::do_fmt(
  ob: $FMT,
  fmt: STR ): STR
Hook for extensions. Calls just ob.fmt(fmt).
$FMT::fmt( f: STR )
  : STR
Recommended routine to format an object.
BASE_FORMAT::
  fmt_xxx
  ( x: XXX,
    s: STR ): STR
Basic routine to format objects of class XXX.
C style formatting
private
FMT::cformat
Determines the correct object to be formatted.
private
FMT::sprintf
( fmt: STR,
  ob: $FMT ): STR
Wrapper for C sprintf.

C style formatting might be removed in later versions.

As the structure of the format expression may be a little bit complicated due to the amount of options, there are parsing routines in BASE_FORMAT called fmt_parse (for numbers) and fmt_parse_easy (for strings) which can be used from outside.

This routine has a lot of parameters, returning the numbers of hash signs found in each of the different parts of the padding areas and some flags which extensions were mentioned in the format string. User defined classes having basic objects as components may use the formatting routines of these objects, use fmt_parse directly or use a completely different routine to analyze or use the formatting expression.

Results of the fmt_parser's

Ther result of fmt_parse* reflects whether the parse was successfull or not. The following enumeration (defined in BASE_FORMAT) collects all possible results:
parse_success
The format expression has been successfully parsed.
parse_syntax_error
The fomat expression contains unexpected characters or a character occured somewhere where is wasn't expcted.
parse_illegal_anchor
Either the type belonging to the format expression does not allow anchors or a second anchor has been discovered.
parse_not_yet
This error message is not yet supported.
parse_dot_and_anchor
The format expression contains a dot and an anchor.
parse_filler_expected
The format expression ends with an `F' but has no character behind.


To be continued...


Last modified: 07/10/1996
Dipl. math. Holger Klawitter (holger@icsi.Berkeley.edu).