The input function handles input and lexical analysis

Desk calculator example — input — command line arguments — expression summary— logical and relational operators — increment and decrement — free store — explicit type conversion — statement summary — declarations — selection statements — decla-rations in conditions — iteration statements — the infamous g go ot to o — comments and indentation — advice — exercises.

6.1 A Desk Calculator [expr.calculator]

where 2 2.5 5 is the result of the first line of input and 1 19 9.6 63 35 5 is the result of the second.

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.

108		Chapter 6

Here is a grammar for the language accepted by the calculator:

e ex xp pr re es ss si io on n P PR RI IN NT T

e ex xp pr re es ss si io on n P PR RI IN NT T e ex xp pr r_ _l li is st t

t te er rm m:

t te er rm m / p pr ri im ma ar ry y

NA AM ME

NA AM ME E = e ex xp pr re es ss si io on n

The style of syntax analysis used is usually called recursive descent; it is a popular and straight-forward top-down technique. In a language such as C++, in which function calls are relatively cheap, it is also efficient. For each production in the grammar, there is a function that calls other functions. Terminal symbols (for example, E EN ND D, N NU UM MB BE ER R, +, and -) are recognized by the lexi-cal analyzer, g ge et t_ _t to ok ke en n(); and nonterminal symbols are recognized by the syntax analyzer func-tions, e ex xp pr r(), t te er rm m(), and p pr ri im
m(). As soon as both operands of a (sub)expression are known, the expression is evaluated; in a real compiler, code could be generated at this point.

The parser uses a function g ge et t_ _t to ok ke en n() to get input. The value of the most recent call of g ge et t_ _t to ok ke en n() can be found in the global variable c cu ur rr r_ _t to ok k. The type of c cu ur rr r_ _t to ok k is the enumera-tion T To ok ke en n_ _v va al lu ue e:

NA AM ME E, NU UM MB BE ER R, EN ND D,

PL LU US S=´+´, MI IN NU US S=´-´, MU UL L=´*´, DI IV V=´/´,

a help to people using debuggers. This works as long as no character used as input has a value used

as an enumerator – and no character set I know of has a printing character with a single-digit inte-

g ge et t_ _t to ok ke en n() to get the next token. Each parser function evaluates ‘‘its’’ expression and returns the

value. The function e ex xp pr r() handles addition and subtraction. It consists of a single loop that looks

f fo or r (;;) / / ‘‘forever’’

s sw wi it tc ch h (c cu ur rr r_ _t to ok k) {

l le ef ft t -= t te er rm m(t tr ru ue e) ;

b br re ea ak k;

This function really does not do much itself. In a manner typical of higher-level functions in a

large program, it calls other functions to do the work.

Note that an expression such as 2 2-3 3+4 4 is evaluated as (2 2-3 3)+4 4, as specified in the grammar.

The curious notation f fo or r(;;) is the standard way to specify an infinite loop; you could pro-

of the space between the + and the =.

Assignment operators are provided for the binary operators

OR, and exclusive OR; << and >> are the left shift and right shift operators; §6.2 summarizes the

operators and their meanings. For a binary operator @ @ applied to operands of built-in types, an

p pr ri im m(), which in turn calls e ex xp pr r(). This loop must be broken somehow. A declaration

d do ou ub bl le e e ex xp pr r(b bo oo ol l) ;

{

d do ou ub bl le e l le ef ft t = p pr ri im m(g ge et t) ;

b br re ea ak k;

c ca as se e D DI IV V:

r re et tu ur rn n e er rr ro or r("d di iv vi id de e b by y 0 0") ;

d de ef fa au ul lt t:

dividing and call e er rr ro or r() if we detect a zero divisor. The function e er rr ro or r() is described in §6.1.4.

The variable d d is introduced into the program exactly where it is needed and initialized immedi-

and the resulting value is the value of the condition (§6.3.2.1). Consequently, the division and

assignment l le ef ft t/=d d is done if and only if d d is nonzero.

d do ou ub bl le e p pr ri im m(b bo oo ol l g ge et t) / / handle primaries

{

g ge et t_ _t to ok ke en n() ;

r re et tu ur rn n v v;

r re et tu ur rn n v v;

}

i if f (c cu ur rr r_ _t to ok k != R RP P) r re et tu ur rn n e er rr ro or r(") e ex xp pe ec ct te ed d") ;

g ge et t_ _t to ok ke en n() ; / / eat ’)’

}

}

ifying the kind of token (a T To ok ke en n_ _v va al lu ue e in this program) and (when needed) the value of the token.

Here, there is only a single, simple variable, c cu ur rr r_ _t to ok k, so the global variable n nu um mb be er r_ _v va al lu ue e is

correctly after an error helps the user.

In the same way that the value of the last N NU UM MB BE ER R is kept in n nu um mb be er r_ _v va al lu ue e, the character

name, the calculator must first look ahead to see if it is being assigned to or simply read. In both cases, the symbol table is consulted. The symbol table is a m ma ap p (§3.7.4, §17.4.1):

ma ap p<s st tr ri in ng g,d do ou ub bl le e> t ta ab bl le e;

v v = 6 63 37 78 8.3 38 88 8;

The reference v v is used to hold on to the d do ou ub bl le e associated with r ra ad di iu us s while e ex xp pr r() calculates the value 6 63 37 78 8.3 38 88 8 from the input characters.

The initial statements read the first non-whitespace character into c ch h and check that the read operation succeeded:

T To ok ke en n_ _v va al lu ue e g ge et t_ _t to ok ke en n()
{
c ch ha ar r c ch h = 0 0;
c ci in n>>c ch h;

Section 6.1.2	The Input Function

c ca as se e ´0 0´: c ca as se e ´1 1´: c ca as se e ´2 2´: c ca as se e ´3 3´: c ca as se e ´4 4´:
c ca as se e ´5 5´: c ca as se e ´6 6´: c ca as se e ´7 7´: c ca as se e ´8 8´: c ca as se e ´9 9´:
c ca as se e ´.´:
c ci in n.p pu ut tb ba ac ck k(c ch h) ;
c ci in n >> n nu um mb be er r_ _v va al lu ue e;
r re et tu ur rn n c cu ur rr r_ _t to ok k=N NU UM MB BE ER R;

Stacking c ca as se e labels horizontally rather than vertically is generally not a good idea because this arrangement is harder to read. However, having one line for each digit is tedious. Because opera-tor >> is already defined for reading floating-point constants into a d do ou ub bl le e, the code is trivial. First the initial character (a digit or a dot) is put back into c ci in n. Then the constant can be read into n nu um mb be er r_ _v va al lu ue e.

T To ok ke en n_ _v va al lu ue e g ge et t_ _t to ok ke en n()
{
c ch ha ar r c ch h = 0 0;
c ci in n>>c ch h;

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T. Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

c ca as se e ´;´:
c ca as se e ´*´:
c ca as se e ´/´:
c ca as se e ´+´:
c ca as se e ´-´:
c ca as se e ´(´:
c ca as se e ´)´:
c ca as se e ´=´:
r re et tu ur rn n c cu ur rr r_ _t to ok k=T To ok ke en n_ _v va al lu ue e(c ch h) ;

Using the calculator as defined so far reveals a few inconveniences. It is tedious to remember to add a semicolon after an expression in order to get its value printed, and having a name terminated by whitespace only is a real nuisance. For example, x x=7 7 is an identifier – rather than the identifier x x followed by the operator = and the number 7 7. Both problems are solved by replacing the type-oriented default input operations in g ge et t_ _t to ok ke en n() with code that reads individual characters.

First, we’ll make a newline equivalent to the semicolon used to mark the end of expression:

s sw wi it tc ch h (c ch h) {
c ca as se e ´;´:
c ca as se e ´\ \n n´:
r re et tu ur rn n c cu ur rr r_ _t to ok k=P PR RI IN NT T;

A do-statement is used; it is equivalent to a while-statement except that the controlled statement is always executed at least once. The call c ci in n.g ge et t(c ch h) reads a single character from the standard input stream into c ch h. By default, g ge et t() does not skip whitespace the way o op pe er ra at to or r >> does. The test i if f (!c ci in n.g ge et t(c ch h)) fails if no character can be read from c ci in n; in this case, E EN ND D is returned to terminate the calculator session. The operator ! (NOT) is used because g ge et t() returns t tr ru ue e in case of success.

Fortunately, these two improvements could both be implemented by modifying a single local sec-tion of code. Constructing programs so that improvements can be implemented through local mod-ifications only is an important design aim.

6.1.4 Error Handling [expr.error]

116		Chapter 6

6.1.5 The Driver [expr.driver]

With all the pieces of the program in place, we need only a driver to start things. In this simple example, m ma ai in n() can do that:

Conventionally, m ma ai in n() should return zero if the program terminates normally and nonzero other-wise (§3.2). Returning the number of errors accomplishes this nicely. As it happens, the only initialization needed is to insert the predefined names into the symbol table.

The primary task of the main loop is to read expressions and write out the answer. This is achieved by the line:

The calculator uses standard library facilities. Therefore, appropriate headers must be #i in nc cl lu ud de ed to complete the program:

#i in nc cl lu ud de e<i io os st tr re ea am m> / / I/O
#i in nc cl lu ud de e<s st tr ri in ng g> / / strings
#i in nc cl lu ud de e<m ma ap p> / / map
#i in nc cl lu ud de e<c cc ct ty yp pe e> / / isalpha(), etc.

After the program was written and tested, I found it a bother to first start the program, then type the expressions, and finally quit. My most common use was to evaluate a single expression. If that expression could be presented as a command-line argument, a few keystrokes could be avoided.

A program starts by calling m ma ai in n() (§3.2, §9.4). When this is done, m ma ai in n() is given two arguments specifying the number of arguments, usually called a ar rg gc c, and an array of arguments, usually called a ar rg gv v. The arguments are character strings, so the type of a ar rg gv v is c ch ha ar r*[a ar rg gc c+1 1]. The name of the program (as it occurs on the command line) is passed as a ar rg gv v[0 0], so a ar rg gc c is always at least 1 1. The list of arguments is zero-terminated; that is, a ar rg gv v[a ar rg gc c]==0 0. For example, for the command

i is st tr ri in ng gs st tr re ea am m. Unfortunately, there is no elegant way of making c ci in n refer to an i is st tr ri in ng gs st tr re ea am m. Therefore, we must find a way of getting the calculator input functions to refer to an i is st tr ri in ng gs st tr re ea am Furthermore, we must find a way of getting the calculator input functions to refer to an

i is st tr ri in ng gs st tr re ea am m or to c ci in n depending on what kind of command-line argument we supply.
A simple solution is to introduce a global pointer i in np pu ut t that points to the input stream to be used and have every input routine use that:

t ta ab bl le e["p pi i"] = 3 3.1 14 41 15 59 92 26 65 53 35 58 89 97 79 93 32 23 38 85 5; t ta ab bl le e["e e"] = 2 2.7 71 18 82 28 81 18 82 28 84 45 59 90 04 45 52 23 35 54 4;

wh hi il le e (*i in np pu ut t) {
g ge et t_ _t to ok ke en n() ;
i if f (c cu ur rr r_ _t to ok k == E EN ND D) b br re ea ak k;
i if f (c cu ur rr r_ _t to ok k == P PR
RI IN NT T) c co on nt ti in nu ue e; c co ou ut t << e ex xp pr r(f fa al ls se e) << ´\ \n n´;
}

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T. Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.7 Command-Line Arguments 119

It was inelegant to modify all of the input routines to use *i in np pu ut t rather than c ci in n to gain the flex-ibility to use alternative sources of input. The change could have been avoided had I shown fore-sight by introducing something like i in np pu ut t from the start. A more general and useful view is to note that the source of input really should be the parameter of a calculator module. That is, the funda-mental problem with this calculator example is that what I refer to as ‘‘the calculator’’ is only a col-lection of functions and data. There is no module (§2.4) or object (§2.5.2) that explicitly represents the calculator. Had I set out to design a calculator module or a calculator type, I would naturally have considered what its parameters should be (§8.5[3], §10.6[16]).

6.1.8 A Note on Style [expr.style]

This section presents a summary of expressions and some examples. Each operator is followed by one or more names commonly used for it and an example of its use. In these tables, a class_name is the name of a class, a member is a member name, an object is an expression yielding a class object, a pointer is an expression yielding a pointer, an expr is an expression, and an lvalue is an expression denoting a nonconstant object. A type can be a fully general type name (with *, (), etc.) only when it appears in parentheses; elsewhere, there are restrictions (§A.5).

The syntax of expressions is independent of operand types. The meanings presented here apply when the operands are of built-in types (§4.1.1). In addition, you can define meanings for operators applied to operands of user-defined types (§2.5.2, Chapter 11).

	Operator Summary
	expr && expr   _ _______________________________________ logical inclusive OR expr \|\| expr simple assignment multiply and assign lvalue = expr expr , expr w expr   

Each box holds operators with the same precedence. Operators in higher boxes have higher prece-dence than operators in lower boxes. For example: a a+b b*c c means a a+(b b*c c) rather than (a a+b b)*c c because * has higher precedence than +.

6.2.1 Results [expr.res]

/ / the value of x=y is the value of x after the assignment / / p points to x
/ / error: x++ is not an lvalue (it is not the value stored in x) / / address of the int with the larger value

If both the second and third operands of ?: are lvalues and have the same type, the result is of that type and is an lvalue. Preserving lvalues in this way allows greater flexibility in using operators. This is particularly useful when writing code that needs to work uniformly and efficiently with both built-in and user-defined types (e.g., when writing templates or programs that generate C++ code).

6.2.2 Evaluation Order [expr.evaluation]

The order of evaluation of subexpressions within an expression is undefined. In particular, you cannot assume that the expression is evaluated left to right. For example:

	Evaluation Order	123

Better code can be generated in the absence of restrictions on expression evaluation order. How-ever, the absence of restrictions on evaluation order can lead to undefined results. For example,

f f1 1(v v[i i] ,i i++) ; f f2 2( (v v[i i] ,i i++) ) ;	/ / two arguments / / one argument

The call of f f1 1 has two arguments, v v[i i] and i i++, and the order of evaluation of the argument expressions is undefined. Order dependence of argument expressions is very poor style and has undefined behavior. The call of f f2 2 has one argument, the comma expression (v v[i i] ,i i++), which is equivalent to i i++.

means ‘‘if i i is less than or equal to 0 0 or if m ma ax x is less than i i.’’ That is, it is equivalent to

i if f ( (i i<=0 0) || (m ma ax x<i i) ) / / ...

For example:

i if f (i i&m ma as sk k == 0 0) / / oops! == expression as operand for &

i if f ((i i&m ma as sk k) == 0 0) / / ...

It is worth noting that the following does not work the way a mathematician might expect:

i if f (a a = 7 7) / / oops! constant assignment in condition

This is natural because = means ‘‘equals’’ in many languages. Again, it is easy for a compiler to warn about most such mistakes – and many do.

The implementation of a stream can set and test its state like this:

s st ta at te e = g go oo od db bi it t;
/ / ...

These stream state flags are observable from outside the stream implementation. For example, we could see how the states of two streams differ like this:

Computing differences of stream states is not very common. For other similar types, computing differences is essential. For example, consider comparing a bit vector that represents the set of interrupts being handled with another that represents the set of interrupts waiting to be handled.

Please note that this bit fiddling is taken from the implementation of iostreams rather than from the user interface. Convenient bit manipulation can be very important, but for reliability, maintain-ability, portability, etc., it should be kept at low levels of a system. For more general notions of a set, see the standard library s se et t (§17.4.3), b bi it ts se et t (§17.5.3), and v ve ec ct to or r<b bo oo ol l> (§16.3.11).

The ++ operator is used to express incrementing directly, rather than expressing it indirectly using a combination of an addition and an assignment. By definition, ++l lv va al lu ue e means l lv va al lu ue e+=1 1, which again means l lv va al lu ue e=l lv va al lu ue e+1 1 provided l lv va al lu ue e has no side effects. The expression denoting the object to be incremented is evaluated once (only). Decrementing is similarly expressed by the --operator. The operators ++ and -- can be used as both prefix and postfix operators. The value of ++x x is the new (that is, incremented) value of x x. For example, y y=++x x is equivalent to y y=(x x+=1 1). The value of x x++, however, is the old value of x x. For example, y y=x x++ is equivalent to y y=(t t=x x,x x+=1 1,t t), where t t is a variable of the same type as x x.

Like addition and subtraction of pointers, ++ and -- on pointers operate in terms of elements of the array into which the pointer points; p p++ makes p p point to the next element (§5.3.1).

is more than a little obscure to non-C programmers and because the style of coding is not uncom-mon in C and C++, it is worth examining more closely.

Consider first a more traditional way of copying an array of characters:

126		Chapter 6

This is wasteful. The length of a zero-terminated string is found by reading the string looking for the terminating zero. Thus, we read the string twice: once to find its length and once to copy it. So we try this instead:

i in nt t i i;
f fo or r (i i = 0 0; q q[i i]!=0 0 ; i i++) p p[i i] = q q[i i] ;
p p[i i] = 0 0; / / terminating zero

The value of *p p++ = *q q++ is *q q. We can therefore rewrite the example like this:

wh hi il le e ((*p p++ = *q q++) != 0 0) { }

The most efficient way of copying a zero-terminated character string for your particular machine ought to be the standard string copy function:

c ch ha ar r* s st tr rc cp py y(c ch ha ar r*, c co on ns st t c ch ha ar r*) ; / / from <string.h>