Eternity Belongs to Science: C++

Showing posts with label C++. Show all posts

Wednesday, December 8, 2010

Definition of EOF and how to use it effectively

The use and meaning of EOF seems to cause a lot of confusion with some new coders, hopefully this explanation will help you understand better. Before I go into too much detail about what EOF is, I'll tell you what it isn't.
EOF is NOT:

A char

A value that exists at the end of a file

A value that could exist in the middle of a file

And now to what it actually is.
EOF is a macro defined as an int with a negative value. It is normally returned by functions that perform read operations to denote either an error or end of input. Due to variable promotion rules (discussed in detail later), it is important to ensure you use an int to store the return code from these functions, even if the function appears to be returning a char, such as getchar() or fgetc().
Here are some code examples that you might use:

int c;
  
while ((c = fgetc(fp)) != EOF)
{
  putchar (c);
}

int ch;

while ((ch = cin.get()) != EOF)
{
cout <<(char)ch;
}

char to int PromotionBy definition an int is larger than a char, therefore a negative valued int can never hold the same value as a char. However, when you compare an int with a char, the char will get promoted to an int to account for the difference in size of the variables. The value of a promoted char is affected by its sign, and unfortunately, a char can be either signed or unsigned by default, this is compiler dependant.
To understand this better, let's look at the representation of a few numbers in both ints and chars.
The following assumes 2 byte ints (your compiler might use a larger amount). A char uses only 1 byte (this will be the same amount on your compiler). With the exception of the first column, the values are shown in hexadecimal.

-----------------------------        ------------------------------
|  char and int comparison  |        |     char to int promotion  |
-----------------------------        ------------------------------
| Decimal |  int    |  char |        |  char | unsigned | signed  |
|---------|---------|-------|        |-------|----------|---------|
|  2      |  00 02  |  02   |        |  02   |  00 02   |  00 02  |
|  1      |  00 01  |  01   |        |  01   |  00 01   |  00 01  |
|  0      |  00 00  |  00   |        |  00   |  00 00   |  00 00  |
| -1      |  FF FF  |  FF   |        |  FF   |  00 FF   |  FF FF  |
| -2      |  FF FE  |  FE   |        |  FE   |  00 FE   |  FF FE  |
-----------------------------        ------------------------------

The "char to int promotion" table makes it clear that the sign of a char produces a very different number in the int.So what does all this mean to me as a programmer?
Well, let's have a look at a revised version of the code shown above, this time incorrectly using a char variable to store the return code from fgetc().

char c;

while ((c = fgetc(fp)) != EOF)
{
putchar (c);
}

Now let's assume that within the file we are reading from is a byte with value 0xff. fgetc() returns this value within an int, so it looks like this: 0x00 0xff (again, I'm assuming 2 byte ints). To store this value in a char, it must be demoted, and the char value becomes 0xff. Next, the char c is compared with the int EOF. Promotion rules apply, and c must be promoted to an int. However, in the sample code, the sign of c isn't explicitly declared, so we don't know if it's signed or unsigned, so the int value could become either 0xff 0xff or 0x00 0xff. Therefore, the code is is not guaranteed to work in the way we require.
The following is a short program to help show the promotion:

#include <stdio.h>

int main(void)
{
int i = -1;
signed char sc = 0xff;
unsigned char usc = 0xff;

printf ("Comparing %x with %x\n", i, sc);
if (i == sc) puts("i == sc");
else puts("i != sc");
putchar ('\n');
printf ("Comparing %x with %x\n", i, usc);
if (i == usc) puts("i == usc");
else puts("i != usc");

return 0;
}

/*
* Output

Comparing ffff with ffff <--- Notice this has been promoted
i == sc

Comparing ffff with ff
i != usc

*
*/

Another scenario to consider is where the char is unsigned. In this case, the process of demoting and promoting the returned value from fgetc() will have the affect of corrupting the EOF value, and the program will get stuck in a infinite loop. Let's follow that process through:

- EOF (0xff 0xff) is returned by fgetc() due to end of input
- Value demoted to 0xff to be stored in unsigned char c
- unsigned char c promoted to an int, value goes from 0xff to 0x00 0xff
- EOF is compared with c, meaning comparison is between 0xff 0xff and 0x00 0xff.
- The result is FALSE (the values are different), which is undesirable.
- fgetc() is called again, and still returns EOF.  The endless loop begins.

The following code demonstrates this problem.

#include <stdio.h>

int main(void)
{
FILE *fp;
unsigned char c;

if ((fp = fopen("myfile.txt", "rb")) == NULL)
{
perror ("myfile.txt");
return 0;
}

while ((c = fgetc(fp)) != EOF)
{
putchar (c);
}

fclose(fp);
return 0;
}

Source

Tuesday, November 30, 2010

Templated Functions

C++ templates can be used both for classes and for functions in C++. Templated functions are actually a bit easier to use than templated classes, as the compiler can often deduce the desired type from the function's argument list.

The syntax for declaring a templated function is similar to that for a templated class:

template <class type> type func_name(type arg1, ...);

For instance, to declare a templated function to add two values together, you could use the following syntax:

template <class type> type add(type a, type b)
{
    return a + b;
}

Now, when you actually use the add function, you can simply treat it like any other function because the desired type is also the type given for the arguments. This means that upon compiling the code, the compiler will know what type is desired:

int x = add(1, 2);

will correctly deduce that "type" should be int. This would be the equivalent of saying:

int x = add<int>(1, 2);

where the template is explicitly instantiated by giving the type as a template parameter.

On the other hand, type inference of this sort isn't always possible because it's not always feasible to guess the desired types from the arguments to the function. For instance, if you wanted a function that performed some kind of cast on the arguments, you might have a template with multiple parameters:

template <class type1, class type2> type2 cast(type1 x)
{
    return (type2)x;
}

Using this function without specifying the correct type for type2 would be impossible. On the other hand, it is possible to take advantage of some type inference if the template parameters are correctly ordered. In particular, if the first argument must be specified and the second deduced, it is only necessary to specify the first, and the second parameter can be deduced.

For instance, given the following declaration

template <class rettype, class argtype> rettype cast(argtype x)
{
    return (rettype)x;
}

this function call specifies everything that is necessary to allow the compiler deduce the correct type:

cast<double>(10);

which will cast an int to a double. Note that arguments to be deduced must always follow arguments to be specified. (This is similar to the way that default arguments to functions work.)

You might wonder why you cannot use type inference for classes in C++. The problem is that it would be a much more complex process with classes, especially as constructors may have multiple versions that take different numbers of parameters, and not all of the necessary template parameters may be used in any given constructor.

Templated Classes with Templated Functions

It is also possible to have a templated class that has a member function that is itself a template, separate from the class template. For instance,

template <class type> class TClass
{
    // constructors, etc
    
    template <class type2> type2 myFunc(type2 arg);
};

The function myFunc is a templated function inside of a templated class, and when you actually define the function, you must respect this by using the template keyword twice:

template <class type>  // For the class
    template <class type2>  // For the function
    type2 TClass<type>::myFunc(type2 arg)
    {
        // code
    }

The following attempt to combine the two is wrong and will not work:

// bad code!
template <class type, class type2> type2 TClass<type>::myFunc(type2 arg)
{
    // ...
}

because it suggests that the template is entirely the class template and not a function template at all.

Source

Templates and Templated Classes in C++

What's better than having several classes that do the same thing to different datatypes? One class that lets you choose which datatype it acts on.
Templates are a way of making your classes more abstract by letting you define the behavior of the class without actually knowing what datatype will be handled by the operations of the class. In essence, this is what is known as generic programming; this term is a useful way to think about templates because it helps remind the programmer that a templated class does not depend on the datatype (or types) it deals with. To a large degree, a templated class is more focused on the algorithmic thought rather than the specific nuances of a single datatype. Templates can be used in conjunction with abstract datatypes in order to allow them to handle any type of data. For example, you could make a templated stack class that can handle a stack of any datatype, rather than having to create a stack class for every different datatype for which you want the stack to function. The ability to have a single class that can handle several different datatypes means the code is easier to maintain, and it makes classes more reusable.
The basic syntax for declaring a templated class is as follows:

template <class a_type> class a_class {...};

The keyword 'class' above simply means that the identifier a_type will stand for a datatype. NB: a_type is not a keyword; it is an identifier that during the execution of the program will represent a single datatype. For example, you could, when defining variables in the class, use the following line:

a_type a_var;

and when the programmer defines which datatype 'a_type' is to be when the program instantiates a particular instance of a_class, a_var will be of that type.
When defining a function as a member of a templated class, it is necessary to define it as a templated function:

template<class a_type> void a_class<a_type>::a_function(){...}

When declaring an instance of a templated class, the syntax is as follows:

a_class<int> an_example_class;

An instantiated object of a templated class is called a specialization; the term specialization is useful to remember because it reminds us that the original class is a generic class, whereas a specific instantiation of a class is specialized for a single datatype (although it is possible to template multiple types).
Usually when writing code it is easiest to precede from concrete to abstract; therefore, it is easier to write a class for a specific datatype and then proceed to a templated - generic - class. For that brevity is the soul of wit, this example will be brief and therefore of little practical application.
We will define the first class to act only on integers.

class calc
{
  public:
    int multiply(int x, int y);
    int add(int x, int y);
 };
int calc::multiply(int x, int y)
{
  return x*y;
}
int calc::add(int x, int y)
{
  return x+y;
}

We now have a perfectly harmless little class that functions perfectly well for integers; but what if we decided we wanted a generic class that would work equally well for floating point numbers? We would use a template.

template <class A_Type> class calc
{
  public:
    A_Type multiply(A_Type x, A_Type y);
    A_Type add(A_Type x, A_Type y);
};
template <class A_Type> A_Type calc<A_Type>::multiply(A_Type x,A_Type y)
{
  return x*y;
}
template <class A_Type> A_Type calc<A_Type>::add(A_Type x, A_Type y)
{
  return x+y;
}

To understand the templated class, just think about replacing the identifier A_Type everywhere it appears, except as part of the template or class definition, with the keyword int. It would be the same as the above class; now when you instantiate an
object of class calc you can choose which datatype the class will handle.

calc <double> a_calc_class;

Templates are handy for making your programs more generic and allowing your code to be reused later.

Source

Monday, November 29, 2010

The C Preprocessor

The C preprocessor modifies a source code file before handing it over to the compiler. You're most likely used to using the preprocessor to include files directly into other files, or #define constants, but the preprocessor can also be used to create "inlined" code using macros expanded at compile time and to prevent code from being compiled twice.

There are essentially three uses of the preprocessor--directives, constants, and macros. Directives are commands that tell the preprocessor to skip part of a file, include another file, or define a constant or macro. Directives always begin with a sharp sign (#) and for readability should be placed flush to the left of the page. All other uses of the preprocessor involve processing #define'd constants or macros. Typically, constants and macros are written in ALL CAPS to indicate they are special (as we will see).

Header Files

The #include directive tells the preprocessor to grab the text of a file and place it directly into the current file. Typically, such statements are placed at the top of a program--hence the name "header file" for files thus included.

Constants

If we write

#define [identifier name] [value]

whenever [identifier name] shows up in the file, it will be replaced by [value].

If you are defining a constant in terms of a mathematical expression, it is wise to surround the entire value in parentheses:

#define PI_PLUS_ONE (3.14 + 1)

By doing so, you avoid the possibility that an order of operations issue will destroy the meaning of your constant:

x = PI_PLUS_ONE * 5;

Without parentheses, the above would be converted to

x = 3.14 + 1 * 5;

which would result in 1 * 5 being evaluated before the addition, not after. Oops!

It is also possible to write simply

#define [identifier name]

which defines [identifier name] without giving it a value. This can be useful in conjunction with another set of directives that allow conditional compilation.

Conditional Compilation

There are a whole set of options that can be used to determine whether the preprocessor will remove lines of code before handing the file to the compiler. They include #if, #elif, #else, #ifdef, and #ifndef. An #if or #if/#elif/#else block or a #ifdef or #ifndef block must be terminated with a closing #endif.

The #if directive takes a numerical argument that evaluates to true if it's non-zero. If its argument is false, then code until the closing #else, #elif, of #endif will be excluded.

Commenting out Code

Conditional compilation is a particularly useful way to comment out a block of code that contains multi-line comments (which cannot be nested).

#if 0
/* comment ...
*/

// code

/* comment */
#endif

Avoiding Including Files Multiple Times (idempotency)

Another common problem is that a header file is required in multiple other header files that are later included into a source code file, with the result often being that variables, structs, classes or functions appear to be defined multiple times (once for each time the header file is included). This can result in a lot of compile-time headaches. Fortunately, the preprocessor provides an easy technique for ensuring that any given file is included once and only once.

By using the #ifndef directive, you can include a block of text only if a particular expression is undefined; then, within the header file, you can define the expression. This ensures that the code in the #ifndef is included only the first time the file is loaded.

#ifndef _FILE_NAME_H_
#define _FILE_NAME_H_

/* code */

#endif // #ifndef _FILE_NAME_H_

Notice that it's not necessary to actually give a value to the expression _FILE_NAME_H_. It's sufficient to include the line "#define _FILE_NAME_H_" to make it "defined". (Note that there is an n in #ifndef--it stands for "if not defined").

A similar tactic can be used for defining specific constants, such as NULL:

#ifndef NULL
#define NULL (void *)0
#endif // #ifndef NULL

Notice that it's useful to comment which conditional statement a particular #endif terminates. This is particularly true because preprocessor directives are rarely indented, so it can be hard to follow the flow of execution.

Macros

The other major use of the preprocessor is to define macros. The advantage of a macro is that it can be type-neutral (this can also be a disadvantage, of course), and it's inlined directly into the code, so there isn't any function call overhead. (Note that in C++, it's possible to get around both of these issues with templated functions and the inline keyword.)

A macro definition is usually of the following form:

#define MACRO_NAME(arg1, arg2, ...) [code to expand to]

For instance, a simple increment macro might look like this:

#define INCREMENT(x) x++

They look a lot like function calls, but they're not so simple. There are actually a couple of tricky points when it comes to working with macros. First, remember that the exact text of the macro argument is "pasted in" to the macro. For instance, if you wrote something like this:

#define MULT(x, y) x * y

and then wrote

int z = MULT(3 + 2, 4 + 2);

what value do you expect z to end up with? The obvious answer, 30, is wrong! That's because what happens when the macro MULT expands is that it looks like this:

int z = 3 + 2 * 4 + 2;    // 2 * 4 will be evaluated first!

So z would end up with the value 13! This is almost certainly not what you want to happen. The way to avoid it is to force the arguments themselves to be evaluated before the rest of the macro body. You can do this by surrounding them by parentheses in the macro definition:

#define MULT(x, y) (x) * (y)
// now MULT(3 + 2, 4 + 2) will expand to (3 + 2) * (4 + 2)

But this isn't the only gotcha! It is also generally a good idea to surround the macro's code in parentheses if you expect it to return a value. Otherwise, you can get similar problems as when you define a constant. For instance, the following macro, which adds 5 to a given argument, has problems when embedded within a larger statement:

#define ADD_FIVE(a) (a) + 5

int x = ADD_FIVE(3) * 3;
// this expands to (3) + 5 * 3, so 5 * 3 is evaluated first
// Now x is 18, not 24!

To fix this, you generally want to surround the whole macro body with parentheses to prevent the surrounding context from affecting the macro body.

#define ADD_FIVE(a) ((a) + 5)

int x = ADD_FIVE(3) * 3;

On the other hand, if you have a multiline macro that you are using for its side effects, rather than to compute a value, you probably want to wrap it within curly braces so you don't have problems when using it following an if statement.

// We use a trick involving exclusive-or to swap two variables
#define SWAP(a, b)  a ^= b; b ^= a; a ^= b; 

int x = 10;
int y = 5;

// works OK
SWAP(x, y);

// What happens now?
if(x < 0)
    SWAP(x, y);

When SWAP is expanded in the second example, only the first statement, a ^= b, is governed by the conditional; the other two statements will always execute. What we really meant was that all of the statements should be grouped together, which we can enforce using curly braces:

#define SWAP(a, b)  {a ^= b; b ^= a; a ^= b;}

Now, there is still a bit more to our story! What if you write code like so:

#define SWAP(a, b)  { a ^= b; b ^= a; a ^= b; }

int x = 10;
int y = 5;
int z = 4;

// What happens now?
if(x < 0)
    SWAP(x, y);
else
    SWAP(x, z);

Then it will not compile because semicolon after the closing curly brace will break the flow between if and else. The solution? Use a do-while loop:

#define SWAP(a, b)  do { a ^= b; b ^= a; a ^= b; } while ( 0 )

int x = 10;
int y = 5;
int z = 4;

// What happens now?
if(x < 0)
    SWAP(x, y);
else
    SWAP(x, z);

Now the semi-colon doesn't break anything because it is part of the expression. (By the way, note that we didn't surround the arguments in parentheses because we don't expect anyone to pass an expression into swap!)

More Gotchas

By now, you've probably realized why people don't really like using macros. They're dangerous, they're picky, and they're just not that safe. Perhaps the most irritating problem with macros is that you don't want to pass arguments with "side effects" to macros. By side effects, I mean any expression that does something besides evaluate to a value. For instance, ++x evaluates to x+1, but it also increments x. This increment operation is a side effect.

The problem with side effects is that macros don't evaluate their arguments; they just paste them into the macro text when performing the substitution. So something like

#define MAX(a, b) ((a) < (b) ? (b) : (a))
int x = 5, y = 10;
int z = MAX(x++, y++);

will end up looking like this:

int x = (x++ < y++ ? y++ : x++)

The problem here is that y++ ends up being evaluated twice! The nasty consequence is that after this expression, y will have a value of 12 rather than the expected 11. This can be a real pain to debug!

Multiline macros

Until now, we've seen only short, one line macros (possibly taking advantage of the semicolon to put multiple statements on one line.) It turns out that by using a the "\" to indicate a line continuation, we can write our macros across multiple lines to make them a bit more readable.

For instance, we could rewrite swap as

#define SWAP(a, b)  {                   \
                        a ^= b;         \
                        b ^= a;         \ 
                        a ^= b;         \
                    }

Notice that you do not need a slash at the end of the last line! The slash tells the preprocessor that the macro continues to the next line, not that the line is a continuation from a previous line.

Aside from readability, writing multi-line macros may make it more obvious that you need to use curly braces to surround the body because it's more clear that multiple effects are happening at once.

Advanced Macro Tricks

In addition to simple substitution, the preprocessor can also perform a bit of extra work on macro arguments, such as turning them into strings or pasting them together.

Pasting Tokens

Each argument passed to a macro is a token, and sometimes it might be expedient to paste arguments together to form a new token. This could come in handy if you have a complicated structure and you'd like to debug your program by printing out different fields. Instead of writing out the whole structure each time, you might use a macro to pass in the field of the structure to print.

To paste tokens in a macro, use ## between the two things to paste together.

For instance

#define BUILD_FIELD(field) my_struct.inner_struct.union_a.##field

Now, when used with a particular field name, it will expand to something like

my_struct.inner_struct.union_a.field1

The tokens are literally pasted together.

String-izing Tokens

Another potentially useful macro option is to turn a token into a string containing the literal text of the token. This might be useful for printing out the token. The syntax is simple--simply prefix the token with a pound sign (#).

#define PRINT_TOKEN(token) printf(#token " is %d", token)

For instance, PRINT_TOKEN(foo) would expand to

printf("<foo>" " is %d" <foo>)

(Note that in C, string literals next to each other are concatenated, so something like "token" " is " " this " will effectively become "token is this". This can be useful for formatting printf statements.)

For instance, you might use it to print the value of an expression as well as the expression itself (for debugging purposes).

PRINT_TOKEN(x + y);

Avoiding Macros in C++

In C++, you should generally avoid macros when possible. You won't be able to avoid them entirely if you need the ability to paste tokens together, but with templated classes and type inference for templated functions, you shouldn't need to use macros to create type-neutral code. Inline functions should also get rid of the need for macros for efficiency reasons. (Though you aren't guaranteed that the compiler will inline your code.)

Moreover, you should use const to declare typed constants rather than #define to create untyped (and therefore less safe) constants. Const should work in pretty much all contexts where you would want to use a #define, including declaring static sized arrays or as template parameters.

Source

Saturday, November 27, 2010

Writing a simple IPv6 program

Configuring an IPv6 address and porting an IPv4 application to IPv6

Summary: This article discusses the concepts behind a simple IPv6 program -- specifically, how IPv6 solves the problems of address space and large routing tables. A programmer familiar with IPv4 will be able to recognise an IPv6 address and configure one for his machine. The article also covers tunneling, mapped addresses, and porting IPv4 to IPv6 applications, as well as the logic of enabling an IPv4 client to handle IPv6 addresses.

In today's networking world, IPv4 is the foundation of networking, but in the last 10 years, questions have come up due to:

Fear of running out of IPv4 address space as soon as 2002
Fear of running out of capacity in global routing tables

Network Address Translation (NAT) and Classless Inter-Domain Routing (CIDR) (see Relevant concepts) have been used as stopgap measures for these simple but serious problems. IPv6 -- also called IPng (IP new generation) -- has been viewed as the long-term solution.
The following enhancements to IPv4 have also been planned:

Simplified header processing
Support for extended options
Enhancements like quality-of-service capabilities, authentication and privacy capabilities, flow control capabilities, and autoconfiguration

The key rule behind all this change is that IPv6 applications should continue to live with IPv4 applications. The bottom line is that IPv6 should support a mixed IPv6 and IPv4 environment.
This article will help you quickly understand the concepts behind IPv6 and write a simple program in it. Let's start with IPv6 addressing.
IPv6 addressing
Types of addresses in IPv6: Anycast is the new baby
In IPv6, there are three types of addresses -- unicast, multicast, and anycast. We had unicast addresses in IPv4, and many systems support multicast, as well. Anycast is a new type of address defined by IPv6.

Unicast: This is like any normal IP address on a single interface (for example, the IPv4 address 9.185.101.1 on en0).
Multicast: A packet sent to a multicast address is delivered to all interfaces identified by that address. There is no broadcast address type since the multicast type can take care of it.
Anycast: A packet sent to an anycast address is delivered to one of the interfaces identified by that address (the "nearest" one, according to the routing protocols' measure of distance). Let's consider a situation where anycast addresses can be used -- connecting to a service provider's router. Assume that the service provider can give you a set of addresses to connect to, and you choose one of these addresses. With IPv6, the service provider can give you an anycast address, which you will use to automatically connect to the "nearest" address. This is a new feature in IPv6 and there is still a lot of debate going on about its implementation.

How IPv6 addresses are written: What a change
There are three conventional forms for representing IPv6 addresses as text strings:
1. The primary form: The preferred form is x:x:x:x:x:x:x:x, where the "x"s are the hexadecimal value of the eight 16-bit pieces of the address. Two examples:

fe80:0:0:0:207:30ee:edcb:d05d
 1080:0:0:0:1:700:200B:417C

There are eight hex fields in the first address:

fe80
0
0
0
207
30ee
edcb
d05d

In IPv6, we do not write the leading zeros in a field. That is, the second field above is just written as "0" rather than "0000." Note that there are 4 hex digits in each field. Each hex digit is 4 bits (and can represent a hex value of 0-F). This means that there are 16 bits in each field (4 hex digits x 4 bits per digit). The total size of an IPv6 address is 128 bits (8 hex fields x 16 bits per field).
2. A different representation of the above address: Due to some methods of allocating certain styles of IPv6 addresses, it is common for addresses to contain long strings of zero bits. In order to make it easier to write addresses containing zero bits, a special syntax is available to compress the zeros. The use of :: indicates multiple groups of 16-bits of zeros. The :: can only appear once in an address, and can also be used to compress the leading zeros in an address. For example:

FF01:0:0:0:0:0:0:101 is a multicast address that can be written as FF01::101.
0:0:0:0:0:0:0:1 is a loopback address that can be written as ::1.

3. For dual environments: An alternative form that is sometimes more convenient when dealing with a mixed environment of IPv4 and IPv6 nodes is x:x:x:x:x:x:d.d.d.d, where the "x"s are the hexadecimal values of the six high-order 16-bit pieces of the address, and the "d"s are the decimal values of the four low-order 8-bit pieces of the address (standard IPv4 representation) -- that is, the first 96 bits are represented as 6- x 16-bit hex fields and the last 32 bits are 4- x 8-bit decimal digits. For example:

::9.184.201.1 
      ::ffff:9.184.209.2

IPv6 address prefix
The IPv6 address prefix denotes the network part of an address and is represented by the notation ipv6-address/prefix-length.
Take this example:

fe80::206:29ff:fedc:e06e/64

In this instance, fe80::206:29ff:fedc:e06e is the address and 64 is the prefix length. These two together give us the address prefix. In the example, specifying 64 means that we take the first 64 bits of the above 128-bit address to identify the network part of the address.

Relevant concepts

Network Address Translation (NAT): An Internet standard that enables a local-area network (LAN) to use one set of IP addresses for internal traffic and a second set of addresses for external traffic. A NAT box located where the LAN meets the Internet makes all necessary IP address translations.
NAT serves three main purposes:

Provides a type of firewall by hiding internal IP addresses
Enables a company to use more internal IP addresses; since they're only used internally, there's no possibility of conflict with IP addresses used by other companies and organizations
Allows a company to combine multiple ISDN connections into a single Internet connection

(See RFC 1631, "Hide & Seek with Gateways & Translators" in Resources.)
Classless Inter-Domain Routing (CIDR): Classless Inter-Domain Routing. A new IP addressing scheme that replaces the older system based on classes A, B, and C. With CIDR, a single IP address can be used to designate many unique IP addresses. A CIDR IP address looks like a normal IP address except that it ends with a slash followed by a number, called the IP prefix. For example: 172.200.0.0/16
The IP prefix specifies how many addresses are covered by the CIDR address, with lower numbers covering more addresses. An IP prefix of /12, for example, can be used to address 4,096 former Class C addresses. CIDR addresses reduce the size of routing tables and make more IP addresses available within organizations.
(See Resources for RFC 1517,1518,1519,1520.)

This raises several questions:

How does the above representation solve the two primary problems we are trying to address:
- The finite amount of available address space?
- Large global routing tables?
How is the network identified in an IPv4 address?
Why should the prefix length be allowed to be specified in an IPv6 address?
How is the prefix specified in an IPv4 address?
What are the problems caused by this?

And here are the answers:
Address space: Regarding the address space question, Robert M Hinden, one of the key figures in IPv6 efforts, explains:

IPV6 supports addresses that are four times the number of bits as IPv4 addresses (128 vs. 32). This is 4 billion times 4 billion times 4 billion (2^96) times the size of the IPv4 address space (2^32). This works out to be:

340,282,366,920,938,463,463,374,607,431,768,211,456

This is an extremely large address space. In a theoretical sense this is approximately 665,570,793,348,866,943,898,599 addresses per square meter of the surface of the planet Earth (assuming the earth surface is 511,263,971,197,990 square meters).

The class enemy: Now let's take up the questions regarding address prefix in IPv4 and IPv6. The division of IPv4 address space into Class A, B, C, and D networks has caused some problems. In IPv4, the network part was fixed by the class of the address. Let's illustrate our point with an example. Class A addresses can support 16 million hosts on each of their 128 networks (because in a class A address, the highest-order bit is set to 0; the next 7 bits are used for the network part; and the remaining 24 bits are used for the local address). Now, if an organisation were given a Class A address, and it didn't have 16 million hosts, then the remaining address space would go to waste. Also note that everyone cannot be given a Class A address as there are only 127. CIDR had to be introduced to solve this problem and prolong the life of IP. This means that the network part of an address should not be fixed. There is a clear need for an organisation-specific network size. This means that the network part of an address should not be fixed. This variable prefix length is implemented in IPv6 by allowing the user to specify the network bits in the address prefix. For example, in the address fe80::206:29ff:fedc:e06e/64 -, the numeral 64 denotes the network part, and this could be changed. Here we have the option of choosing the network part. This is flexible, unlike IPv4 where it has always been fixed.
Routing tables: The routes in the Internet grew in time. Backbone routers were approaching their limit in 1984. If CIDR were not introduced to solve the problem of space in global backbone routers, they would have just come to a halt.
CIDR technique: So how does IPv6 solve this problem? The technique for stopping this problem is to allow for address prefixes that fit specific organisational needs. This technique was basically introduced in CIDR. In IPv6 the prefix or the network part is also specified by a user-specified network prefix. This helps to aggregate a large number of IP addresses and specify a single route for the organisation. If an organisation has many networks, then in the case of IPv4, many network prefixes are to be specified in the global routing table. In the case of IPv6, we can simply give one higher level route to represent the whole organisation, as we can shrink and expand the network prefix by varying it. This helps the global tables to remain small. This kind of setup did not exist in IPv4. (For more on CIDR, refer to Relevant concepts).
Autoconfiguration in IPv6: Plug and play
What is autoconfiguration? The first thing one should do is to set up a machine with an IPv6 address. There is an interesting feature in IPv6 called stateless autoconfiguration that's defined by RFC 2462 (see Resources). This RFC states that your host should be able to give you an automatic, globally unique IPv6 address.
For example, In AIX, you simply boot up your machine and type autoconf6 -v from the # prompt, and you will see your machine automatically detecting the subnet and assigning you a valid IPng address.
I ran ifconfig to see the IPv6 address. Here is a partial output of ifconfig -a on my AIX machine:

inet 9.184.209.3 netmask 0xffffff00 broadcast 9.184.209.255
 inet6 fe80::207:30ee:edcb:d05d/64

I got the inet6 address when I ran autoconf -v6 (inet6 is defined on en0). This machine now has both an IPv6 and IPv4 on the same physical ethernet interface.
How is this done? In very simple terms, the link-layer address is used as a base to get the IPv6 address and the host and router to communicate, so that the host can get an idea about the subnet. (Refer to the RFC for a more detailed discussion.)
How about other operating systems? The other UNIX implementations have similar IPv6 autconfiguration commands like AIX. There is also a variety of free-soft implementations of IPv6 (see Resources).
Can I manually configure? Yes. You can also configure an IPv6 address using ifconfig. It's important to plan your network to assign the network prefix.
Tunneling and mapped IPng addresses: The transition should be smooth
Example of a transition problem
Consider this situation. We have an existing IPv4 environment with IPv4-only hosts and routers. Now let's say we add a few IPv6 routers and hosts to our network. Some of these hosts have the capability to handle both IPv6 and IPv4 addresses, and some of them are pure IPv6 or pure IPv4. If we have to write an application that runs in this environment, then the application's client and server should be able to handle all possible client-server pairs. That is, a client or server can be purely IPv4, purely IPv6, or both IPv6- and IPv4-enabled. (For a detailed explanation, read RFC 2893: "Transition mechanisms for hosts and routers" -- see Resources.)
What is the tunneling technique? Again, let's take an example situation. We need to carry an IPv6 packet over an IPv4 network. How do we proceed? Simple -- we just encapsulate the IPv6 packet in an IPv4 packet and send it across the IPv4 network. This is called tunneling.
Configured tunneling: We need to configure the host that is at the entry point of the IPv4 network so that it can convert the IPv6 packet into an IPv4 packet. Also, the node that is the exit point of the IPv4 network needs to be configured so that it can convert the packet back to an IPv6 packet. This is called configured tunneling.
Automatic tunneling: If a host has the capability to do this conversion dynamically then it's called automatic tunneling.
Support for Automatic tunneling in the protocol: The nodes that utilize this technique are assigned special IPv6 unicast addresses. These addresses carry an IPv4 address in the low-order 32-bits. This type of address is termed an IPv4-compatible IPv6 address and has the following format:

|                80 bits               | 16 |      32 bits        |
     +--------------------------------------+--------------------------+
     |0000..............................0000|0000|    IPV4 ADDRESS     |
     +--------------------------------------+----+---------------------+

A second type of IPv6 address that holds an embedded IPv4 address is also defined. This address is used to represent the addresses of IPv4-only nodes (those that do not support IPv6) as IPv6 addresses. This type of address is termed an "IPv4-mapped IPv6 address" and has the format:

|                80 bits               | 16 |      32 bits        |
     +--------------------------------------+--------------------------+
     |0000..............................0000|FFFF|    IPV4 ADDRESS     |
     +--------------------------------------+----+---------------------+

Usage of mapped addresses
If you are writing an IPv6-enabled client, you're faced with this question: Do you send out an IPv6 packet or do you send out an IPv4 packet? You are given no guarantee about the underlying network. The next machine you contact to get this connection can be an IPv6 machine, an IPv4 machine, or a dual host.
Let's assume that the applications responsible for routing the connections are capable of knowing whether the next machine is an IPv6 machine or an IPv4 machine. In this case, it would be really helpful if we could have IPv6 addresses that can contain IPv4 addresses inside them. It would be good to have a mechanism (the ffff. in mapped v4 addresses) to tell us if the address is referring to a pure IPv4 node; this would help us make appropriate decisions as to which type of packet is to be sent. Our discussion in the final section should make this clearer.

Porting IPv4 applications to IPv6
Here are some things to consider when porting an IPv4 application to IPv6:

The sockaddr_in6 structure and the in6_addr structure, which can hold 128 bit addresses, have been defined. Check if you are using the relevant IPv6 structure.
INADDR_ANY and INADDR_LOOPBACK must be modified to in6addr_any or in6addr_loopback for assignments. The IN6ADDR_ANY_INIT or IN6ADDR_LOOPBACK_INIT macros can be helpful.
Use AF_INET6 instead of AF_INET.
Note there are structures and programs that will work for IPv6 and IPv4. One of the links points to porting examples and this link can be referred to (see "Moving to IPv6" in Resources).
Note that no change in the syntax is necessary when using certain functions for IPv6. The only difference when using these functions is that you must cast sockaddr_in6 to struct sockaddr*.

The following macros and functions are used to write IPv6-enabled applications:

The IN6_IS_ADDR_V4MAPPED can be used to determine whether an IPv6 address is an IPv4-mapped address.
gethostbyname retrieves a network host entry via its name and address family.
getaddrinfo returns address information related to a specified service location.
getnameinfo returns the text strings associated with the supplied IP address and port number.
inet_pton converts the specified address in text form to its binary equivalent.
inet_ntop converts the specified binary address into a text equivalent that's suitable for presentation.
getaddrinfo and getnameinfo can both be used to retrieve information related to IPv4 and IPv6 addresses. inet_pton and inet_ntop can both convert IPv4 and IPv6 addresses. This means that in "IPv6-ready" applications, you do not need to use either inet_addr or inet_ntoa.
The following functions do not require a change in syntax when used for IPv6: bind, connect, sendmsg, sendto, accept, recvfrom, recvmsg, getpeername, and getsockname, although the code for these functions has been modified.

Writing a simple IPv6 client
Let's now take a look at the logic behind writing an IPv6-enabled client. I believe we are equipped with the basics. We know about IPv6 addresses. We will be able to recognise them if we see them in different representations. We will be able to autoconfigure an IPv6 address on our machine using autoconf. We also know about the mapped address transition mechanism and have an idea of the functions to use. Consider the following IPv4 client:

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdio.h>
#include <netdb.h>
    ...
main(argc, argv) /* client side */
int argc;
char *argv[];
{
    struct sockaddr_in server;
    struct servent *sp;
    struct hostent *hp;
    int s;
    ...
    sp = getservbyname("login", "tcp");
    if (sp == NULL) {
     fprintf(stderr, "rlogin: tcp/login: unknown service\n");
     exit(1);
    }
    hp = gethostbyname(argv[1]);
    if (hp == NULL) {
     fprintf(stderr, "rlogin: %s: unknown host\n", argv[1]);
     exit(2);
    }
    memset((char *)&server, 0, sizeof(server));
    memcpy((char *)&server.sin_addr, hp->h_addr, hp->h_length);
    server.sin_len = sizeof(server);
    server.sin_family = hp->h_addrtype;
    server.sin_port = sp->s_port;
    s = socket(AF_INET, SOCK_STREAM, 0);
    if (s < 0) {
     perror("rlogin: socket");
     exit(3);
    }
    ...
    /* Connect does the bind for us */
    if (connect(s, (struct sockaddr *)&server, sizeof(server)) < 0) {
     perror("rlogin: connect");
     exit(5);
    }

We will examine the logic behind converting this to an IPv6-enabled client.
How to make the above client IPv6-enabled
The sockaddr_in6 structure: We are using struct sockaddr_in structure. We cannot use the same structure as the member sin_addr can only hold 32 bits. When porting the client to an IPv6 client we need to use sockaddr_in6, which can hold a 128-bit address.

struct sockaddr_in6  {
         u_char          sin6_len;
         u_char          sin6_family;
         u_int16_t       sin6_port;
         u_int32_t       sin6_flowinfo;
         struct          in6_addr        sin6_addr;
  };

The family:sin6_family will have AF_INET6 instead of AF_INET in our program.
sin6_flowinfo field: An application may specify the flow label and priority by setting the sin6_flowinfo field of the destination address sockaddr_in6 structure. We can set it to 0 for now.
What type of addresses will the client handle? We have three situations. A user may pass:

A colon-separated IPv6 address
A dot-separated IPv4 address
Just a host name

IPv6 address: If it's a colon-separated IPv6 address, then we can just copy it into the structure.
IPv4 address: If it's an IPv4 address, we need to copy it into the last 32 bits and mark the 16 bits before those 32 bits with 0xffff.
Host name: If it's a host name, then we use gethostbyname to pick up the address. gethostbyname picks up an IPv4 address by default.
If we call gethostbyname after setting _res.options (resolv.h) in AIX, we can force it to do an IPv6 lookup:

_res.options |= ~RES_USE_INET6

Note that if there is no IPv6 address present for the host name (in the /etc/hosts in UNIX or the DNS), then an IPv6 lookup by gethostbyname will return an IPv4 address, but we still need to do the mapping (filling bits 81-96 with 0xffff). Also, some implementations have another gethostbyname2 call for IPv6 lookups.
Why is mapping done? We do this mapping in order to use the sockadd_in6 structure in the connect call regardless of whether we are trying to send to an IPv6 address or an IPv4 address. If not, we will require two connect calls -- one that takes an IPv6 address and one that takes an IPv4 address. The other technique is to use a union of the sockaddr_in and sockaddr_in6 structure. Programmers can also devise their own techniques and it's not compulsory to map.

// use the isinet_addr call to find out whether its a valid
 // dotted ipv4 address

if (isinet_addr(hostname )) {
    ......

//now you might wonder what s6_addr16[5] is - this is basically a union member normally 
 //defined in in.h which will point to bits 81-96

 ip6.sin6_addr.s6_addr16[5] = 0xffff;

 //now we are copying the ipv4 address in the last 32 bits
bcopy(address, &ip6.sin6_addr.s6_addr16[6], sizeof(struct in_addr));
              
ip6.sin6_len = sizeof(struct in6_addr);
ip6.sin6_family = AF_INET6;
        ......               
} 

//check if its is an IPv6 : separated address - inet_pton is used for this

else if (inet_pton(AF_INET6, hostname, &ip6.sin6_addr) > 0) {

            //note inet_pton will take care of setting the address 

            .....
            ip6.sin6_family = AF_INET6;
            ip6.sin6_len = sizeof(struct sockaddr_in6);
            .....
}



else {
//now its not a v6 address or a v4 address so it should be host name
//do a v6 lookup , note that a v6 lookup will look for a v6 address if not 
//present it can pick up a v4 address

//res init is defined in resolv.h 
res_init();
_res.options |= RES_USE_INET6;
 hptr  = gethostbyname(name);
             .....
//check hptr->h_addrtype if its AF_INET6 you can copy the address directly
//if not you need to map it.
.....
.....

 if (connect(sd, &ip6, sizeof (ip6 < 0)
{
 //connect failure
 ....
}
else
{
 //continue with the program.
}

Summary of the above logic
To summarize the logic, we check to see if we got a dotted IPv4 address to handle. If so, we go ahead and map it and fill in an IPv6 structure, to be used by the connect call later. If it's an IPv6 address, we copy it directly to the IPv6 structure. If it's a hostname, we try and do an IPv6 lookup. We can get an IPv4 or an IPv6 address. We know this from the family field. Accordingly, we either map it or copy it, then do a single connect call regardless of whether it's an IPv4 or an IPv6 address, and proceed with our program.

Conclusion
We have looked only at the concepts we need to write the above program. There are many more interesting concepts that will soon become part of everyday life. There are controversies and constructive debates about things like DNS for IPv6 and stateful autoconfiguration for IPv6(DHCP). These topics, along with others, such as implementation of other layers, how routing will be done, and how autoconfiguration will be implemented, will make for interesting discussion. I hope to see you soon in a more exciting IPv6 world!

Source: http://www.ibm.com/developerworks/web/library/wa-ipv6.html

Bitwise Operators in C and C++: A Tutorial

Generally, as a programmer you don't need to concern yourself about operations at the bit level. You're free to think in bytes, or ints and doubles, or even higher level data types composed of a combination of these. But there are times when you'd like to be able to go to the level of an individual bit. Exclusive-or encryption is one example when you need bitwise operations.

Another example comes up when dealing with data compression: what if you wanted to compress a file? In principle, this means taking one representation and turning it into a representation that takes less space. One way of doing this is to use an encoding that takes less than 8 bits to store a byte. (For instance, if you knew that you would only be using the 26 letters of the Roman alphabet and didn't care about capitalization, you'd only need 5 bits to do it.) In order to encode and decode files compressed in this manner, you need to actually extract data at the bit level.

Finally, you can use bit operations to speed up your program or perform neat tricks. (This isn't always the best thing to do.)

Thinking about Bits

The byte is the lowest level at which we can access data; there's no "bit" type, and we can't ask for an individual bit. In fact, we can't even perform operations on a single bit -- every bitwise operator will be applied to, at a minimum, an entire byte at a time. This means we'll be considering the whole representation of a number whenever we talk about applying a bitwise operator. (Note that this doesn't mean we can't ever change only one bit at a time; it just means we have to be smart about how we do it.) Understanding what it means to apply a bitwise operator to an entire string of bits is probably easiest to see with the shifting operators. By convention, in C and C++ you can think about binary numbers as starting with the most significant bit to the left (i.e., 10000000 is 128, and 00000001 is 1). Regardless of underlying representation, you may treat this as true. As a consequence, the results of the left and right shift operators are not implementation dependent for unsigned numbers (for signed numbers, the right shift operator is implementation defined).

The leftshift operator is the equivalent of moving all the bits of a number a specified number of places to the left:

[variable]<<[number of places]

For instance, consider the number 8 written in binary 00001000. If we wanted to shift it to the left 2 places, we'd end up with 00100000; everything is moved to the left two places, and zeros are added as padding. This is the number 32 -- in fact, left shifting is the equivalent of multiplying by a power of two.

int mult_by_pow_2(int number, int power)
{
    return number<<power;
}

Note that in this example, we're using integers, which are either 2 or 4 bytes, and that the operation gets applied to the entire sequence of 16 or 32 bits.

But what happens if we shift a number like 128 and we're only storing it in a single byte: 10000000? Well, 128 * 2 = 256, and we can't even store a number that big in a byte, so it shouldn't be surprising that the result is 00000000.

It shouldn't surprise you that there's a corresponding right-shift operator: >> (especially considering that I mentioned it earlier). Note that a bitwise right-shift will be the equivalent of integer division by 2.

Why is it integer division? Consider the number 5, in binary, 00000101. 5/2 is 2.5, but if you are performing integer division, 5/2 is 2. When you perform a right shift by one: (unsigned int)5>>1, you end up with 00000010, as the rightmost 1 gets shifted off the end; this is the representation of the number 2. Note that this only holds true for unsigned integers; otherwise, we are not guaranteed that the padding bits will be all 0s.

Generally, using the left and right shift operators will result in significantly faster code than calculating and then multiplying by a power of two. The shift operators will also be useful later when we look at how to manipulating individual bits.

For now, let's look at some of the other binary operators to see what they can do for us.

Bitwise AND

The bitwise AND operator is a single ampersand: &. A handy mnemonic is that the small version of the boolean AND, &&, works on smaller pieces (bits instead of bytes, chars, integers, etc). In essence, a binary AND simply takes the logical AND of the bits in each position of a number in binary form.

For instance, working with a byte (the char type):

01001000 & 
10111000 = 
--------
00001000

The most significant bit of the first number is 0, so we know the most significant bit of the result must be 0; in the second most significant bit, the bit of second number is zero, so we have the same result. The only time where both bits are 1, which is the only time the result will be 1, is the fifth bit from the left. Consequently,

72 & 184 = 8

Bitwise OR

Bitwise OR works almost exactly the same way as bitwise AND. The only difference is that only one of the two bits needs to be a 1 for that position's bit in the result to be 1. (If both bits are a 1, the result will also have a 1 in that position.) The symbol is a pipe: |. Again, this is similar to boolean logical operator, which is ||.

01001000 | 
10111000 = 
--------
11111000

and consequently

72 | 184 = 248

Let's take a look at an example of when you could use just these four operators to do something potentially useful. Let's say that you wanted to keep track of certain boolean attributes about something -- for instance, you might have eight cars (!) and want to keep track of which are in use. Let's assign each of the cars a number from 0 to 7.

Since we have eight items, all we really need is a single byte, and we'll use each of its eight bits to indicate whether or not a car is in use. To do this, we'll declare a char called in_use, and set it to zero. (We'll assume that none of the cars are initially "in use".)

char in_use = 0;

Now, how can we check to make sure that a particular car is free before we try to use it? Well, we need to isolate the one bit that corresponds to that car. The strategy is simple: use bitwise operators to ensure every bit of the result is zero except, possibly, for the bit we want to extract.

Consider trying to extract the fifth bit from the right of a number: XX?XXXXX We want to know what the question mark is, and we aren't concerned about the Xs. We'd like to be sure that the X bits don't interfere with our result, so we probably need to use a bitwise AND of some kind to make sure they are all zeros. What about the question mark? If it's a 1, and we take the bitwise AND of XX?XXXXX and 00100000, then the result will be 00100000:

XX1XXXXX & 
00100000 = 
--------
00100000

Whereas, if it's a zero, then the result will be 00000000:

XX0XXXXX & 
00100000 = 
--------
00000000

So we get a non-zero number if, and only if, the bit we're interested in is a 1.

This procedure works for finding the bit in the nth position. The only thing left to do is to create a number with only the one bit in the correct position turned on. These are just powers of two, so one approach might be to do something like:

int is_in_use(int car_num)
{
    // pow returns an int, but in_use will also be promoted to an int
    // so it doesn't have any effect; we can think of this as an operation
    // between chars
    return in_use & pow(2, car_num);
}

While this function works, it can be confusing. It obscures the fact that what we want to do is shift a bit over a certain number of places, so that we have a number like 00100000 -- a couple of zeros, a one, and some more zeros. (The one could also be first or last -- 10000000 or 00000001.)

We can use a bitwise leftshift to accomplish this, and it'll be much faster to boot. If we start with the number 1, we are guaranteed to have only a single bit, and we know it's to the far-right. We'll keep in mind that car 0 will have its data stored in the rightmost bit, and car 7 will be the leftmost.

int is_in_use(int car_num)
{
    return in_use & 1<<car_num;
}

Note that shifting by zero places is a legal operation -- we'll just get back the same number we started with.

All we can do right now is check whether a car is in use; we can't actually set the in-use bit for it. There are two cases to consider: indicating a car is in use, and removing a car from use. In one case, we need to turn a bit on, and in the other, turn a bit off.

Let's tackle the problem of turning the bit on. What does this suggest we should do? If we have a bit set to zero, the only way we know right now to set it to 1 is to do a bitwise OR. Conveniently, if we perform a bitwise OR with only a single bit set to 1 (the rest are 0), then we won't affect the rest of the number because anything ORed with zero remains the same (1 OR 0 is 1, and 0 OR 0 is 0).

Again we need to move a single bit into the correct position: void set_in_use(int car_num) { in_use = in_use | 1<<car_num; } What does this do? Take the case of setting the rightmost bit to 1: we have some number 0XXXXXXX | 10000000; the result, 1XXXXXXX. The shift is the same as before; the only difference is the operator and that we store the result.

Setting a car to be no longer in use is a bit more complicated. For that, we'll need another operator.

The Bitwise Complement

The bitwise complement operator, the tilde, ~, flips every bit. A useful way to remember this is that the tilde is sometimes called a twiddle, and the bitwise complement twiddles every bit: if you have a 1, it's a 0, and if you have a 0, it's a 1.

This turns out to be a great way of finding the largest possible value for an unsigned number:

unsigned int max = ~0;

0, of course, is all 0s: 00000000 00000000. Once we twiddle 0, we get all 1s: 11111111 11111111. Since max is an unsigned int, we don't have to worry about sign bits or twos complement. We know that all 1s is the largest possible number.

Note that ~ and ! cannot be used interchangeably. When you take the logical NOT of a non-zero number, you get 0 (FALSE). However, when you twiddle a non-zero number, the only time you'll get 0 is when every bit is turned on. (This non-equivalence principle holds true for bitwise AND too, unless you know that you are using strictly the numbers 1 and 0. For bitwise OR, to be certain that it would be equivalent, you'd need to make sure that the underlying representation of 0 is all zeros to use it interchangeably. But don't do that! It'll make your code harder to understand.)

Now that we have a way of flipping bits, we can start thinking about how to turn off a single bit. We know that we want to leave other bits unaffected, but that if we have a 1 in the given position, we want it to be a 0. Take some time to think about how to do this before reading further.

We need to come up with a sequence of operations that leaves 1s and 0s in the non-target position unaffected; before, we used a bitwise OR, but we can also use a bitwise AND. 1 AND 1 is 1, and 0 AND 1 is 0. Now, to turn off a bit, we just need to AND it with 0: 1 AND 0 is 0. So if we want to indicate that car 2 is no longer in use, we want to take the bitwise AND of XXXXX1XX with 11111011.

How can we get that number? This is where the ability to take the complement of a number comes in handy: we already know how to turn a single bit on. If we turn one bit on and take the complement of the number, we get every bit on except that bit:

~(1<<position)

Now that we have this, we can just take the bitwise AND of this with the current field of cars, and the only bit we'll change is the one of the car_num we're interested in.

int set_unused(int car_num)
{
    in_use = in_use & ~(1<<position);
}

You might be thinking to yourself, but this is kind of clunky. We actually need to know whether a car is in use or not (if the bit is on or off) before we can know which function to call. While this isn't necessarily a bad thing, it means that we do need to know a little bit about what's going on. There is an easier way, but first we need the last bitwise operator: exclusive-or.

Bitwise Exclusive-Or (XOR)

There is no boolean operator counterpart to bitwise exclusive-or, but there is a simple explanation. The exclusive-or operation takes two inputs and returns a 1 if either one or the other of the inputs is a 1, but not if both are. That is, if both inputs are 1 or both inputs are 0, it returns 0. Bitwise exclusive-or, with the operator of a carrot, ^, performs the exclusive-or operation on each pair of bits. Exclusive-or is commonly abbreviated XOR.

For instance, if you have two numbers represented in binary as 10101010 and 01110010 then taking the bitwise XOR results in 11011000. It's easier to see this if the bits are lined up correctly:

01110010 ^
10101010 
--------
11011000

You can think of XOR in the following way: you have some bit, either 1 or 0, that we'll call A. When you take A XOR 0, then you always get A back: if A is 1, you get 1, and if A is 0, you get 0. On the other hand, when you take A XOR 1, you flip A. If A is 0, you get 1; if A is 1, you get 0.

So you can think of the XOR operation as a sort of selective twiddle: if you apply XOR to two numbers, one of which is all 1s, you get the equivalent of a twiddle.

Additionally, if you apply the XOR operation twice -- say you have a bit, A, and another bit B, and you set C equal to A XOR B, and then take C XOR B: you get A XOR B XOR B, which essentially either flips every bit of A twice, or never flips the bit, so you just get back A. (You can also think of B XOR B as cancelling out.) As an exercise, can you think of a way to use this to exchange two integer variables without a temporary variable? (Once you've figured it out, check the solution.)

How does that help us? Well, remember the first principle: XORing a bit with 0 results in the same bit. So what we'd really like to be able to do is just call one function that flips the bit of the car we're interested in -- it doesn't matter if it's being turned on or turned off -- and leaves the rest of the bits unchanged.

This sounds an awful lot like the what we've done in the past; in fact, we only need to make one change to our function to turn a bit on. Instead of using a bitwise OR, we use a bitwise XOR. This leaves everything unchanged, but flips the bit instead of always turning it on:

void flip_use_state(int car_num)
{
    in_use = in_use ^ 1<<car_num;
}

When should you use bitwise operators?

Bitwise operators are good for saving space -- but many times, space is hardly an issue. And one problem with working at the level of the individual bits is that if you decide you need more space or want to save some time -- for instance, if we needed to store information about 9 cars instead of 8 -- then you might have to redesign large portions of your program. On the other hand, sometimes you can use bitwise operators to cleverly remove dependencies, such as by using ~0 to find the largest possible integer. And bit shifting to multiply by two is a fairly common operation, so it doesn't affect readability in the way that advanced use of bit manipulation can in some cases (for instance, using XOR to switch the values stored in two variables).

There are also times when you need to use bitwise operators: if you're working with compression or some forms of encryption, or if you're working on a system that expects bit fields to be used to store boolean attributes.

Summary

You should now be familiar with six bitwise operators:

Works on bits for left argument, takes an integer as a second argument

bit_arg<<shift_arg

Shifts bits to of bit_arg shift_arg places to the left -- equivalent to multiplication by 2^shift_arg

bit_arg>>shift_arg

Shifts bits to of bit_arg shift_arg places to the right -- equivalent to integer division by 2^shift_arg

Works on the bits of both arguments

left_arg & right_arg

Takes the bitwise AND of left_arg and right_arg

left_arg ^ right_arg

Takes the bitwise XOR of left_arg and right_arg

left_arg | right_arg

Works on the bits of only argument

~arg

Reverses the bits of arg

Skills and knowledge You also know a couple of neat tricks that you can use when performance is critical, or space is slow, or you just need to isolate and manipulate individual bits of a number.

And you now should have a better sense of what goes on at the lowest levels of your computer.

A Parting Puzzle

One final neat trick of bitwise operators is that you can use them, in conjunction with a bit of math, to find out whether an integer is a power of two. Take some time to think about it, then check out the solution.

Source: http://www.cprogramming.com/tutorial/bitwise_operators.html

Eternity Belongs to Science

Pages