Introduction to C, by The Linux Information Project (LINFO) (original) (raw)

C is in many ways the most important of the hundreds of programming languages that have been developed in the world to date.

It is by far the most frequently used language for writing system software, (i.e., operating systems, other programming languages and compilers), and it is also widely employed for writing application programs. C remains particularly popular in the world of Unix-like operating systems, and, for example, most of the Linux kernel (i.e., the core of the operating system) is written in C. Moreover, it is one of the most frequently studied languages in computer science classes.

C is important to study not only because of its own usefulness but also because many other programming languages are based on it. Among the most popular of them are C++ (an object-oriented language that is used mainly for the development of application programs), Java (an object-oriented language that improves upon C++ and is the main language for enterprise-class, networked applications) and Perl (a scripting language that is commonly used for administrative and text processing tasks).

C is a traditional, procedural language, that is, one that requires the programmer to provide step-by-step instructions for the CPU (central processing unit) or other processor (i.e., logic chip). Object-oriented languages, which became popular in the 1990s, make extensive use of objects, which are software packets that contain a collection of related data and procedures for operating on that data. A scripting language is one that is designed to be simple to use and does not need to be compiled in advance.

The great success of C is due to its simplicity, efficiency, flexibility and small memory requirements. It is also due to the portability of programs that are written in it, that is, the ability to be easily adapted to new platforms (i.e., operating systems and processors).

C's great portability is in very large part a result of the fact that compilers and libraries are available for numerous platforms. A compiler is a specialized program that converts source code (i.e., the original, human-readable form of a program typed into a computer by a programmer in a programming language) into another language, usually machine code (also called machine language) so that it can be directly understood by processors. A library is a collection of routines (also called subprograms, procedures or functions) that perform operations which are commonly required by programs.

And C has maintained its popularity despite its age (more than 30 years old, which is very old in the computer field) and the proliferation of other languages that are intended to be easier and more convenient to use, namely object-oriented and scripting languages.

C is a relatively minimalist programming language and is a lower level language (i.e., closer to assembly languages) than most others. Even though it is sometimes referred to as a high level (i.e., embodies much abstraction and is thus much more efficient to use for writing source code) language, it is really high-level only in comparison to assembly languages.

An assembly language is essentially a machine language that has had its sequences of bits (i.e., zeros and ones) replaced by letters and numbers so that it is much easier for humans to read and write. A machine language is a mere pattern of instructions consisting of bits that can be read directly by a computer's processor. In contrast to higher level languages (including C), there is a one-to-one correspondence between an assembly language and a machine language for the same processor.

C differs significantly from assembly languages in that it is much easier to read and write programs in it, particularly lengthy ones, because its syntax and vocabulary are much closer to those of the English language. Also, programs written in an assembly language are usually applicable to only a single processor type, whereas C programs can generally be easily ported to (i.e., modified for) any processor for which a C compiler and any required libraries exist (which is the great majority of processors).

Systems software such as operating system kernels (which are generally written in C) may contain fragments of assembly language where performance is particularly critical, as is the case with the Linux kernel. However, the machine code generated by modern C compilers for today's complex processors is often as fast as, or even faster, than that attained with hand-written assembly language.

Early History

C was developed by Dennis Ritchie at Bell Labsin the early 1970s for systems programming on the PDP-11 computer. A graduate of Harvard University with degrees in physics and applied mathematics, the modest Ritchie said that he wrote it because "it looked like a good thing to do" and that anyone else at the same time and in the same place would have done something similar.

Bell Labs was established in 1925 by AT&T (The American Telephone and Telegraph Company), the former U.S. telecommunications monopoly, and it subsequently became one of the most prolific sources of innovation that has ever existed. Launched in 1970, the PDP-11 was an innovative and successful model in Digital Equipment Corporation's (DEC) popular and influential PDP series of minicomputers.

C grew out of, and acquired its name from, an earlier language called B, which was written by Ken Thompson. Also employed by Bell Labs, Thompson wrote the original version of the UNIX operating system there in 1969. B was a revision of a still earlier language, bon, which had been named after Thompson's wife, Bonnie.

C was designed specifically as a powerful but minimalist language suitable for use in writing operating systems that were easy to adapt to various processors, and it was developed particularly with UNIX in mind. By 1973, it had become sufficiently powerful that it was used to rewrite most of the UNIX kernel, which had originally been written in PDP-11/20 assembly language. This resulted in the UNIX kernel becoming one of the first operating system kernels implemented in a language other than an assembly language.

Among the advantages of rewriting UNIX in C were that the code was more compact and that it was easier to port to other processors. This, together with AT&T's providing UNIX to universities, businesses and government agencies, led to an acceleration in the development of that operating system and contributed to a surge in the popularity of C. Nevertheless, B continued to be employed into the 1990s on Honeywell mainframes and for some embedded applications.

K&R C

Ritchie and Brian Kernighan (who was one of the co-developers of the awk programming language) published the first edition of their classic book The C Programming Language in 1978. Known to C programmers as K&R, for many years this book served as an informal specification for the language, and the version of C that it describes is usually referred to as K&R C.

K&R C is often considered to be the most basic subset of the C language that a C compiler must support. Even after the introduction of ANSI C, it continued to be regarded as the lowest common denominator to which C programmers adhered when maximum portability was desired. This is because it took some time for most compilers to be updated to fully support ANSI C, and because well-written K&R C code is also legal ANSI C.

ANSI C

In 1983, the American National Standards Institute (ANSI) formed a committee to establish a standard specification for C. The standard was finally completed in 1989 and ratified as ANSI X3.159-1989 Programming Language C, which is commonly referred to as ANSI C.

One of the goals of this standardization was to produce a superset of K&R C that incorporated many of the unofficial features that had subsequently been introduced into the language. The committee also added several additional features. ANSI C is now supported by most of the commonly used C compilers, and most C code written today is based on ANSI C.

Thereafter, the C language specification remained relatively static for some time, whereas C++ continued to evolve and Java was introduced (in 1994) as an attempt to further improve on C++. Finally, in 1999, a new standard, ISO 9899:1999, commonly referred to as C99, was published, and it was adopted as an ANSI standard in March 2000. C compilers have subsequently been moving towards compliance with C99, although support remains incomplete in some cases.

GNU C

GNU C refers to the superset of K&R C and ANSI C that can be compiled by the GCC. Originally an acronym for GNU C Compiler, GCC now stands for GNU Compiler Collection, because it has been expanded to include compilers for additional programming languages.

The GCC was developed as part of the GNU project, which was begun in 1984 for the purpose of developing a complete, high performance, UNIX-compatible and totally free (i.e., for anyone to obtain at no monetary cost and to use for any purpose) operating system.

The GCC has been ported to more processors and operating systems than any other compiler, and it currently runs on in excess of 60 platforms. This, together with its excellent performance and the facts that it is free and is typically found on Unix-like operating systems, has contributed to its increasingly important role in the development of C programs and in maintaining the popularity of the C language.

Numerous excellent and free resources are available on the Internet for learning more about C. Among them is How to Create a First C Program on Linux, a tutorial designed by The Linux Information Project to be suitable for people with absolutely no programming experience.

Created June 29, 2004. Last updated June 28, 2006.
Copyright © 2004 - 2006 The Linux Information Project. All Rights Reserved.