Wednesday, December 8, 2010

Why it's bad to use feof() to control a loop

When reading in a file, and processing it line by line, it's logical to think of the code loop as "while not at the end of the file, read and process data". This often ends up looking something like this:
i = 0;
  
while (!feof(fp))
{
  fgets(buf, sizeof(buf), fp);
  printf ("Line %4d: %s", i, buf);
  i++;
}

This apparently simple snippet of code has a bug in it, though. The problem stems from the method feof() uses to determine if EOF has actually been reached. Let's have a look at the C standard:
7.19.10.2 The feof function

Synopsis

1 #include <stdio.h>
int feof(FILE *stream);

Description
2 The feof function tests the end-of-file indicator for the stream pointed to by stream.

Returns
3 The feof function returns nonzero if and only if the end-of-file indicator is set for stream.

Do you see the problem yet? The function tests the end-of-file indicator, not the stream itself. This means that another function is actually responsible for setting the indicator to denote EOF has been reached. This would normally be done by the function that performed the read that hit EOF. We can then follow the problem to that function, and we find that most read functions will set EOF once they've read all the data, and then performed a final read resulting in no data, only EOF.With this in mind, how does it manifest itself into a bug in our snippet of code? Simple... as the program goes through the loop to get the last line of data, fgets() works normally, without setting EOF, and we print out the data. The loop returns to the top, and the call to feof() returns FALSE, and we start to go through the loop again. This time, the fgets() sees and sets EOF, but thanks to our poor logic, we go on to process the buffer anyway, without realising that its content is now undefined (most likely untouched from the last loop).
This problem results in the last line being printed twice. Now, with the various code and compilers I've tried, I've seen varying results when using this poor quality code. Some give the wrong answer as described here, but some do seem to get it right, and print the last line only once.
Here is a full example of the broken code. It's pointless providing sample results, as they're not necessarily going to be the same as yours. However, if you compile this code, and run it against an empty file (0 bytes), it should output nothing. If it's doing it wrong, as I expect it will, you'll get a line similar to this:
Line 0: Garbage
Here, Garbage was left in the buffer from the initialisation, but should not have been printed. Anyway, enough talk, here's the code.
#include <stdio.h> 
#include <stdlib.h> 

#define MYFILE "junk1.txt" 

int main(void)
{
  FILE *fp;
  char buf[BUFSIZ] = "Garbage";
  int i;
  
  if ((fp = fopen(MYFILE, "r")) == NULL)
  {
    perror (MYFILE);
    return (EXIT_FAILURE);
  }
  
  i = 0;
  
  while (!feof(fp))
  {
    fgets(buf, sizeof(buf), fp);
    printf ("Line %4d: %s", i, buf);
    i++;
  }
  
  fclose(fp);
    
  return(0);
}

To correct the problem, always follow this rule: use the return code from the read function to determine when you've hit EOF. Here is a revised edition of the same code, this time checking the return code from fgets() to determine when the read fails. The code is exactly the same, except for the loop.
#include <stdio.h> 
#include <stdlib.h> 

#define MYFILE "junk1.txt" 

int main(void)
{
  FILE *fp;
  char buf[BUFSIZ] = "Garbage";
  int i;
  
  if ((fp = fopen(MYFILE, "r")) == NULL)
  {
    perror (MYFILE);
    return (EXIT_FAILURE);
  }
  
  i = 0;

  while (fgets(buf, sizeof(buf), fp) != NULL)
  {
    printf ("Line %4d: %s", i, buf);
    i++;
  }
  
  fclose(fp);
    
  return(0);
}

When this is run against an empty file (0 bytes), it will not print anything.Here are some other read functions being used to control loops:
total = 0;
  
while (fscanf(fp, "%d", &num) == 1)
{
  total += num;
}

printf ("Total is %d\n", total);

int c;
  
while ((c = fgetc(fp)) != EOF)
{
  putchar (c);
}
Source

No comments:

Post a Comment