CLOC -- Count Lines of Code (original) (raw)

Overview
Latest version: 1.72, released 2017-01-14, on Github
SourceForge Project Page
License
Why Use cloc?
Other Counters
Basic Use
Building a Windows Executable
Options
Recognized Languages
How it Works
Advanced Use
Limitations
How to Request Support for Additional Languages
Author
Acknowledgments
Copyright
License

Overview

[Translations: Belarussian,Bulgarian, Russian, Serbo-Croatian, Slovakian Ukrainian]

cloc counts blank lines, comment lines, and physical lines of source code in many programming languages. Given two versions of a code base, cloc can compute differences in blank, comment, and source lines. It is written entirely in Perl with no dependencies outside the standard distribution of Perl v5.6 and higher (code from some external modules is embedded within cloc) and so is quite portable. cloc is known to run on many flavors of Linux, FreeBSD, NetBSD, OpenBSD, Mac OS X, AIX, HP-UX, Solaris, IRIX, z/OS, and Windows. (To run the Perl source version of cloc on Windows one needs ActiveState Perl5.6.1 or higher, Strawberry Perl, Cygwin, or MobaXTerm with the Perl plug-in installed. Alternatively one can use the Windows binary of cloc generated with PAR::Packerto run on Windows computers that have neither Perl nor Cygwin.)

cloc contains code from David Wheeler's SLOCCount, Damian Conway and Abigail's Perl moduleRegexp::Common, Sean M. Burke's Perl moduleWin32::Autoglob, and Tye McQueen's Perl moduleAlgorithm::Diff. Language scale factors were derived from Mayes Consulting, LLC web site http://softwareestimator.com/IndustryData2.htm.

Install via package manager

Depending your operating system, one of these installation methods may work for you:

npm install -g cloc # https://www.npmjs.com/package/cloc sudo apt-get install cloc # Debian, Ubuntu sudo yum install cloc # Red Hat, Fedora sudo pacman -S cloc # Arch sudo pkg install cloc # FreeBSD sudo port install cloc # Mac OS X with MacPorts

Download stable release

The source code, release notes, Windows executable, and Unix package for the current stable release can be found at http://sourceforge.net/projects/cloc/files/cloc/v1.64/.

Download development version

Source code for the latest Subversion commit can be found athttp://sourceforge.net/p/cloc/code/HEAD/tree/trunk/cloc.

License

cloc is licensed under the GNU General Public License, v2, excluding portions which are copied from other sources. Code copied from the Regexp::Common, Win32::Autoglob, and Algorithm::Diff Perl modules is subject to the Artistic License.

Why Use cloc?

cloc has many features that make it easy to use, thorough, extensible, and portable:

Exists as a single, self-contained file that requires minimal installation effort---just download the file and run it.
Can read language comment definitions from a file and thus potentially work with computer languages that do not yet exist.
Allows results from multiple runs to be summed together by language and by project.
Can produce results in a variety of formats: plain text, SQL, XML, YAML, comma separated values.
Can count code within compressed archives (tar balls, Zip files, Java .ear files).
Has numerous troubleshooting options.
Handles file and directory names with spaces and other unusual characters.
Has no dependencies outside the standard Perl distribution.
Runs on Linux, FreeBSD, NetBSD, OpenBSD, Mac OS X, AIX, HP-UX, Solaris, IRIX, and z/OS systems that have Perl 5.6 or higher. The source version runs on Windows with either ActiveState Perl, Strawberry Perl, Cygwin, or MobaXTerm+Perl plugin. Alternatively on Windows one can run the Windows binary which has no dependencies.

Other Counters

If cloc does not suit your needs here are other freely available counters to consider:

Other references:

QSM's directory of code counting tools.
The Wikipedia entry for source code line counts.

Regexp::Common, Digest::MD5, Win32::Autoglob, Algorithm::Diff

Although cloc does not need Perl modules outside those found in the standard distribution, cloc does rely on a few external modules. Code from three of these external modules--Regexp::Common, Win32::Autoglob, and Algorithm::Diff--is embedded within cloc. A fourth module, Digest::MD5, is used only if it is available. If cloc finds Regexp::Common or Algorithm::Diff installed locally it will use those installation. If it doesn't, cloc will install the parts of Regexp::Common and/or Algorithm:Diff it needs to temporary directories that are created at the start of a cloc run then removed when the run is complete. The necessary code from Regexp::Common v2.120 and Algorithm::Diff v1.1902 are embedded within the cloc source code (see subroutinesInstall_Regexp_Common() andInstall_Algorithm_Diff() ). Only three lines are needed from Win32::Autoglob and these are included directly in cloc.

Additionally, cloc will use Digest::MD5 to validate uniqueness among input files if Digest::MD5 is installed locally. If Digest::MD5 is not found the file uniqueness check is skipped.

The Windows binary is built on a computer that has both Regexp::Common and Digest::MD5 installed locally.

Building a Windows Executable

The default Windows download, cloc-1.64.exe, was built with PAR::Packer on a Windows 7 computer with Strawberry Perl. Windows executables of cloc versions 1.60 and earlier were built withperl2exeon a 32 bit Windows XP computer. A small modification was made to the cloc source code before passing it to perl2exe; lines 87 and 88 were uncommented:

85 # Uncomment next two lines when building Windows executable with perl2exe 86 # or if running on a system that already has Regexp::Common. 87 #use Regexp::Common; 88 #$HAVE_Rexexp_Common = 1;

Why is the Windows executable so large?

Windows executables of cloc versions 1.60 and earlier, created with perl2exe as noted above, are about 1.6 MB, while newer versions, created with PAR::Packer, are 11 MB. Why are the newer executables so much larger? My theory is that perl2exe uses smarter tree pruning logic than PAR::Packer, but that's pure speculation.

Create your own executable

If you have access to perl2exe, you can use it to create a tight Windows executable. See lines 84-87 in the cloc source code for a minor code modification that is necessary when using perl2exe.

Otherwise, to build a Windows executable with pp from PAR::Packer, first install a Windows-based Perl distribution (for example Strawberry Perl or ActivePerl) following their instructions. Next, open a command prompt, aka a DOS window and install the PAR::Packer module. Finally, invoke the newly installed pp command with the cloc souce code to create an .exe file:

C:> perl -MCPAN -e shell cpan> install PAR::Packer cpan> exit C:> pp cloc-1.64.pl

A variation on the above is if you installed the portable version of Strawberry Perl, you will need to run

portableshell.bat

first to properly set up your environment. The Strawberry Perl derived executable on the SourceForge download area was created with the portable version on a Windows 7 computer.

Basic Use

cloc is a command line program that takes file, directory, and/or archive names as inputs. Here's an example of running cloc against the Perl v5.10.0 source distribution:

prompt> cloc perl-5.10.0.tar.gz 4076 text files. 3883 unique files.
1521 files ignored.

http://cloc.sourceforge.net v 1.50 T=12.0 s (209.2 files/s, 70472.1 lines/s)

Language files blank comment code

Perl 2052 110356 130018 292281 C 135 18718 22862 140483 C/C++ Header 147 7650 12093 44042 Bourne Shell 116 3402 5789 36882 Lisp 1 684 2242 7515 make 7 498 473 2044 C++ 10 312 277 2000 XML 26 231 0 1972 yacc 2 128 97 1549 YAML 2 2 0 489 DOS Batch 11 85 50 322 HTML 1 19 2 98

SUM: 2510 142085 173903 529677

To run cloc on Windows computers, one must first open up a command (aka DOS) window and invoke cloc.exe from the command line there.

Options

prompt> cloc

Usage: cloc [options] <file(s)/dir(s)> | <set 1> <set 2> |

Count, or compute differences of, physical lines of source code in the given files (may be archives such as compressed tarballs or zip files) and/or recursively below the given directories.

Input Options --extract-with= This option is only needed if cloc is unable to figure out how to extract the contents of the input file(s) by itself. Use to extract binary archive files (e.g.: .tar.gz, .zip, .Z). Use the literal '>FILE<' as a stand-in for the actual file(s) to be extracted. For example, to count lines of code in the input files gcc-4.2.tar.gz perl-5.8.8.tar.gz on Unix use --extract-with='gzip -dc >FILE< | tar xf -' or, if you have GNU tar, --extract-with='tar zxf >FILE<' and on Windows use, for example: --extract-with=""c:\Program Files\WinZip\WinZip32.exe" -e -o >FILE< ." (if WinZip is installed there). --list-file= Take the list of file and/or directory names to process from , which has one file/directory name per line. Only exact matches are counted; relative path names will be resolved starting from the directory where cloc is invoked.
See also --exclude-list-file. --unicode Check binary files to see if they contain Unicode expanded ASCII text. This causes performance to drop noticably.

Processing Options --autoconf Count .in files (as processed by GNU autoconf) of recognized languages. --by-file Report results for every source file encountered. --by-file-by-lang Report results for every source file encountered in addition to reporting by language. --count-and-diff
First perform direct code counts of source file(s) of and separately, then perform a diff of these. Inputs may be pairs of files, directories, or archives. See also --diff, --diff-alignment, --diff-timeout, --ignore-case, --ignore-whitespace. --diff Compute differences in code and comments between source file(s) of and . The inputs may be pairs of files, directories, or archives. Use --diff-alignment to generate a list showing which file pairs where compared. See also --count-and-diff, --diff-alignment, --diff-timeout, --ignore-case, --ignore-whitespace. --diff-timeout Ignore files which take more than seconds to process. Default is 10 seconds. (Large files with many repeated lines can cause Algorithm::Diff::sdiff() to take hours.) --follow-links [Unix only] Follow symbolic links to directories (sym links to files are always followed). --force-lang=[,] Process all files that have a extension with the counter for language . For example, to count all .f files with the Fortran 90 counter (which expects files to end with .f90) instead of the default Fortran 77 counter, use --force-lang="Fortran 90",f If is omitted, every file will be counted with the counter. This option can be specified multiple times (but that is only useful when is given each time). See also --script-lang, --lang-no-ext. --force-lang-def= Load language processing filters from , then use these filters instead of the built-in filters. Note: languages which map to the same file extension (for example: MATLAB/Objective C/MUMPS/Mercury; Pascal/PHP; Lisp/OpenCL; Lisp/Julia; Perl/Prolog) will be ignored as these require additional processing that is not expressed in language definition files. Use --read-lang-def to define new language filters without replacing built-in filters (see also --write-lang-def). --ignore-whitespace Ignore horizontal white space when comparing files with --diff. See also --ignore-case. --ignore-case Ignore changes in case; consider upper- and lower- case letters equivalent when comparing files with --diff. See also --ignore-whitespace. --lang-no-ext= Count files without extensions using the counter. This option overrides internal logic for files without extensions (where such files are checked against known scripting languages by examining the first line for #!). See also --force-lang, --script-lang. --max-file-size= Skip files larger than megabytes when traversing directories. By default, =100. cloc's memory requirement is roughly twenty times larger than the largest file so running with files larger than 100 MB on a computer with less than 2 GB of memory will cause problems.
Note: this check does not apply to files explicitly passed as command line arguments. --read-binary-files Process binary files in addition to text files. This is usually a bad idea and should only be attempted with text files that have embedded binary data. --read-lang-def= Load new language processing filters from and merge them with those already known to cloc.
If defines a language cloc already knows about, cloc's definition will take precedence.
Use --force-lang-def to over-ride cloc's definitions (see also --write-lang-def ). --script-lang=, Process all files that invoke as a #! scripting language with the counter for language . For example, files that begin with #!/usr/local/bin/perl5.8.8 will be counted with the Perl counter by using --script-lang=Perl,perl5.8.8 The language name is case insensitive but the name of the script language executable, , must have the right case. This option can be specified multiple times. See also --force-lang, --lang-no-ext. --sdir= Use as the scratch directory instead of letting File::Temp chose the location. Files written to this location are not removed at the end of the run (as they are with File::Temp). --skip-uniqueness Skip the file uniqueness check. This will give a performance boost at the expense of counting files with identical contents multiple times (if such duplicates exist). --stdin-name= Give a file name to use to determine the language for standard input. --strip-comments= For each file processed, write to the current directory a version of the file which has blank lines and comments removed. The name of each stripped file is the original file name with . appended to it. It is written to the current directory unless --original-dir is on. --original-dir [Only effective in combination with --strip-comments] Write the stripped files to the same directory as the original files. --sum-reports Input arguments are report files previously created with the --report-file option. Makes a cumulative set of results containing the sum of data from the individual report files. --unix Override the operating system autodetection logic and run in UNIX mode. See also --windows, --show-os. --windows Override the operating system autodetection logic and run in Microsoft Windows mode. See also --unix, --show-os.

Filter Options --exclude-dir=[,D2,] Exclude the given comma separated directories D1, D2, D3, et cetera, from being scanned. For example --exclude-dir=.cache,test will skip all files that have /.cache/ or /test/ as part of their path. Directories named .bzr, .cvs, .hg, .git, and .svn are always excluded. --exclude-ext=[,[...]] Do not count files having the given file name extensions. --exclude-lang=[,L2,] Exclude the given comma separated languages L1, L2, L3, et cetera, from being counted. --exclude-list-file= Ignore files and/or directories whose names appear in . should have one file name per line. Only exact matches are ignored; relative path names will be resolved starting from the directory where cloc is invoked.
See also --list-file. --include-lang=[,L2,] Count only the given comma separated languages L1, L2, L3, et cetera. --match-d= Only count files in directories matching the Perl regex. For example --match-d='/(src|include)/' only counts files in directories containing /src/ or /include/. --not-match-d= Count all files except those in directories matching the Perl regex. --match-f= Only count files whose basenames match the Perl regex. For example --match-f='^[Ww]idget' only counts files that start with Widget or widget. --not-match-f= Count all files except those whose basenames match the Perl regex. --skip-archive= Ignore files that end with the given Perl regular expression. For example, if given --skip-archive='(zip|tar(.(gz|Z|bz2|xz|7z))?)' the code will skip files that end with .zip, .tar, .tar.gz, .tar.Z, .tar.bz2, .tar.xz, and .tar.7z. --skip-win-hidden On Windows, ignore hidden files.

Debug Options --categorized= Save names of categorized files to . --counted= Save names of processed source files to . --explain= Print the filters used to remove comments for language and exit. In some cases the filters refer to Perl subroutines rather than regular expressions. An examination of the source code may be needed for further explanation. --diff-alignment= Write to a list of files and file pairs showing which files were added, removed, and/or compared during a run with --diff. This switch forces the --diff mode on. --help Print this usage information and exit. --found= Save names of every file found to . --ignored= Save names of ignored files and the reason they were ignored to . --print-filter-stages Print processed source code before and after each filter is applied. --show-ext[=] Print information about all known (or just the given) file extensions and exit. --show-lang[=] Print information about all known (or just the given) languages and exit. --show-os Print the value of the operating system mode and exit. See also --unix, --windows. -v[=] Verbose switch (optional numeric value). --version Print the version of this program and exit. --write-lang-def= Writes to the language processing filters then exits. Useful as a first step to creating custom language definitions (see also --force-lang-def, --read-lang-def).

Output Options --3 Print third-generation language output. (This option can cause report summation to fail if some reports were produced with this option while others were produced without it.) --by-percent X Instead of comment and blank line counts, show these values as percentages based on the value of X in the denominator: X = 'c' -> # lines of code X = 'cm' -> # lines of code + comments X = 'cb' -> # lines of code + blanks X = 'cmb' -> # lines of code + comments + blanks For example, if using method 'c' and your code has twice as many lines of comments as lines of code, the value in the comment column will be 200%. The code column remains a line count. --csv Write the results as comma separated values. --csv-delimiter= Use the character as the delimiter for comma separated files instead of ,. This switch forces --out= Synonym for --report-file=. --csv to be on. --progress-rate= Show progress update after every files are processed (default =100). Set to 0 to suppress progress output (useful when redirecting output to STDOUT). --quiet Suppress all information messages except for the final report. --report-file= Write the results to instead of STDOUT. --sql= Write results as SQL create and insert statements which can be read by a database program such as SQLite. If is -, output is sent to STDOUT. --sql-append Append SQL insert statements to the file specified by --sql and do not generate table creation statements. Only valid with the --sql option. --sql-project= Use as the project identifier for the current run. Only valid with the --sql option. --sql-style=