Binary and text files (original) (raw)
Computer files can be divided into two broad category: binary and text. The distinction is subtle because to computers, any file is a sequence of digital bits.
Text files (plain text files) are files with generally a one-to-one correspondence between the bytes and ordinary readable characters such as letters and digits. Therefore any simple program to view a file makes them human-readable. Generally, they contain ASCII characters and some control characters such as tabs, line feeds and carriage returns without any embedded information such as font information, hyperlinks or inline images. But sometimes text files contain more than ASCII characters if they are encoded by East-Asian encoding such as SJIS or unicode. If the files are written in unicode, a UTF standard such as UTF-8 defines the encoding format. Although text files are generally human-readable, they can of course be used for data storage by computer programs. This may be done because text files avoid problems which may arise with binary files, such as problems of endianness or the byte-length of integers.
Note that a webpage with formatted text is not in plain text, but the HTML source is; whether a file contains plain text thus may depend on the level on which one is considering it.
Text files can have the MIME type "text/plain", often with suffixes indicating an encoding. Common encodings for plain text include Unicode UTF-8, Unicode UTF-16, ISO 8859, and ASCII.
Transferring text files between Unix, Macintosh, and Microsoft Windows or DOS computers can be problematical, as each platform uses different characters to signify a line break. See new line for a discussion of this confusion.
Binary files in turn usually contain non-alphabetic characters, and are used to store data in general rather than plain text. Computer programs are typical examples. As a result, compiled applications are often simply referred to as binaries, as opposed to source code, which is contained in text files. But binary files can also be image files, sound files, compressed files; basically any file other than text file. Usually the specification of a binary file's file format indicates how to handle that file.
Binary files can be encoded into plain texts to improve durability, using encoding schemes such as Base64.