CS50 Week 1: Intro to C

C is an older language, and relative to Javascript, is simpler but more tedious to code in, because it is stricter regarding its syntax. However, C has found lasting use in applications that have formerly been coded in assembly language, such as Operating Systems, due to it mapping more efficiently to machine instructions.

 

Hello World

We start off with a simple task: saying ‘Hello, World.”

In C, the code looks like this:

#include <stdio.h>

int main(void)
{
    printf(“hello, world\n);
}

#include indicates that we would like to include a file in our program.

<> is used for system header files. It searches for a file named in the brackets in a standard list of system directories.

stdio.h is the standard input/output library, that contains the printf function

int main(void): main function is called by the operating system when the user runs the program, and is therefore the first code to execute. It accepts inputs via parameters, which will be explained later. (void) accepts no input.

printf, as you might have guessed, prints formatted data to stdout (standard output). If a format is specified (prepended with %, and indicated by a specific character, like i for interger), the argument following it would be in that format.

printf ("floats: %4.2f %+.0e %E \n", 3.1416, 3.1416, 3.1416);

\n indicates a new line.

 

Loops

There are three loops in C programming:

  1. For loop
  2. While loop
  3. Do…while loop

For loop

To repeat something x times:

for (int i = 0; i < 50; i++)
{
    printf("hello, world\n");
}

 

While loop

To repeat something while the condition is true:

while (true)
{
    printf("hello, world\n");
}

 

Do…while loop

At the start, do an action, and repeats if the condition is true:

do
{
   printf("hello, world\n");
}
while (true);

Functions

printf is a function included in the stdio.h library. Functions are modules of code that takes an argument (an input/ infomation), does the computation according to the code, and return (usually) a new piece of information.

In the stdio.h library, there are functions other than printf. Some functions relating to input include:

  • get_char – gets a character from the user
  • get_double
  • get_float
  • get_int
  • get_long_long
  • get_string – gets a string of characters from the user

The rest are integers or floating points of differing ranges: https://msdn.microsoft.com/en-us/library/cc953fe1.aspx

The reason there are different ranges is due to the limited number of bytes in the computer’s memory. This leads us to our next topic:

 

Overflow

Because of the limited memory, in C, each type of data has a fixed number of bytes allocated to instances of it. For example, every int has only 4 bytes in the CS50 IDE.

One problem that pops up is integer overflow. Imagine a binary number with 8 bits:

1 1 1 1 1 1 1 0

If we add 1 to that, it becomes ‘11111111’, but if we were to add another 1 to it, it starts carrying over all the 0s to become ‘00000000’, but we do not have any extra bit to store the larger value.

This kind of memory limit is the cause of a famous game bug called ‘Nuclear Gandhi’ in Civilisation, where if a player adopts ‘democracy’, their aggression would be reduced by 2. But since Ghandi’s agression level starts off low, dropping to -1 would loop back to the highest cap of 255 instead.

We can see integer oveflow in this code:

// Integer overflow

#include <stdio.h>
#include <unistd.h>

int main(void)
{
    // Iteratively double i
    for (int i = 1; ; i *= 2)
    {
        printf("%i\n", i);
        sleep(1);
    }
}

If compiled and ran, we will see:

1
2
4

1073741824
overflow.c:9:25: runtime error: signed integer overflow: 1073741824 * 2 cannot be represented in type ‘int’ -2147483648
0
0

Another bug can arise out of floating-point imprecision.

#include <stdio.h>

int main(void)
{
    printf("%.55f\n", 1.0 / 10.0);
}

If you compile and run the above code, you get:

0.100000000000000000555111512312578…

That is because floats have a finite number of bits and cannot accurately represent an infinite decimal. So that’s the closest approximation by the computer.

This obviously creates limitations and dangerous bugs that we should avoid or deal with.

Compiling

Computers only understand binary, so the source code (in .c format), needs to be converted to machine code that can be executed (.exe, for eg). This conversion is done by a software called a compiler.

 

For CS50, we are all using the same IDE, at cs50.io. The IDE already includes a compiler.

In order to compile the code for ‘Hello, world’ above, we will save the file as ‘hello.c’ on the workspace.

Then type into the terminal(grey text indicates terminal’s displayed text):

~/workspace/$ clang hello.c

You will notice a new file ‘a.out’, which is the machine code of our program. To run it, we can’t just click on it. Instead type into the terminal:

~/workspace/$ ./a.out

In later exercises, we will employ the tool called ‘make’ in order to compile the code,  which calls the clang compiler with special options that can be set for a project.

~/workspace/$ make program.c

There are other commands in the environment we can use:

ls lists the files in the current directory.

cd changes the directory (as in cd pset1), and cd… goes up one level of the directory.

rmdir removes directories.

 

compiling process

Compiling actually refers to a process that is made of four separate steps:

    • preprocessing

      • In C, preprocessing involves replacing the lines that start with #include with the contents of the actual file.
    • compiling

      • The compiler then takes the complete source code and converts it to assembly code:
        main:                               #   @main
            .cfi_startproc
        # BB#0:
            pushq   %rbp
        .Ltmp0:
            .cfi_def_cfa_offset 16
        .Ltmp1:
            .cfi_offset %rbp, -16
            movq %rsp, %rbp
        .Ltmp2:
            .cfi_def_cfa_register %rbp
            subq $16, %rsp
            movabsq $.L.str, %rdi
            movb $0, %al
            callq   printf
        ...

        These lines are single-step arithmetic or memory management instructions that CPUs can perform.

    • assembling

      • Finally, these lines of assembly are converted to 0s and 1s that the CPU can directly understand.
    • linking

      • We also need to combine into our program the binary file for standard I/O library that we call functions from, and this last step does exactly that. Recall that we only included stdio.h, which is just the header file that declares the functions, not the actual code for them.

       

By having different stages, it’s easier to debug, and work with each layer. The more complicated systems that we build atop them are cleaner, more secure, and better-designed.

 

Strings, Arrays

In C, strings are just arrays of characters stored in sequenced bytes, which can be represented like so:

download.png

Each of the boxes are numbered, from 0 to billions (depending on however much memory we have). Each character occupy one byte, and a special null character is stored at the end of the string.

This character is literally the number 0 (not the ASCII equivalent of the character ‘0’).

For other data types in C, a fixed number of bytes is allocated for them, so they do not need a terminating character.

 

Command Line Argument

You can use the command line to accept arguments, which can simplify your program’s UI and make it faster to use.

Recall that we said that main accepts inputs. We will be replacing void with the syntax needed to accept command line arguments:

int main(int argc, char *argv[])

argv refers to the argument vector, which is the variable that contains the arguments passed to the program via the command line (therefore having the data type char). It is an array of pointers to strings of characters, with a null-character terminating each string. More details.

So if you have an argv with 3 values, including the program name, it will look like this:

 argv
  ---         ---        -------------------
 |  -|---->  |  -|----> | p | r | o | g |\0 |
  ---        |---|       -------------------
             |  -|----> | 2 |\0 | 
             |---|       -------------------
             |  -|----> | o | u | t | p | u | t | \0 |
             |---|       ---------------------------

creates a pointer (basically like an ‘address’ to which byte of memory the data is stored). More on that later (https://stackoverflow.com/questions/4955198/what-does-dereferencing-a-pointer-mean).

argc refers to the argument count, which can be useful to check how many arguments there are and returning 0 (for success) if the count is correct. It is an integer that represents the number of white space separated strings on the command line. White space refers to space and tab characters.

A simple program can be written as such:

#include <cs50.h>
#include <stdio.h>

// says hello to user if they insert their name after program

int main(int argc, string argv[])
{
    if (argc == 2)
    {
        printf("hello, %s\n", argv[1]);
    }
    else
    {
        printf("hello, world\n");
    }
}

Note that if you try to access an element in an array that does not exist, it will result in segmentation fault. (Eg: trying to access argv[100].

 

 

Leave a comment

Design a site like this with WordPress.com
Get started