Import Text Data Files with Low-Level I/O - MATLAB & Simulink (original) (raw)

Overview

Low-level file I/O functions allow the most control over reading or writing data to a file. However, these functions require that you specify more detailed information about your file than the easier-to-use_high-level functions_, such asimportdata. For more information on the high-level functions that read text files, see Import Text Files.

If the high-level functions cannot import your data, use one of the following:

For additional information, see:

Note

The low-level file I/O functions are based on functions in the ANSIĀ® Standard C Library. However, MATLABĀ® includes vectorized versions of the functions, to read and write data in an array with minimal control loops.

Reading Data in a Formatted Pattern

To import text files that importdata andtextscan cannot read, consider usingfscanf. The fscanf function requires that you describe the format of your file, but includes many options for this format description.

For example, create a text file mymeas.dat as shown. The data in mymeas.dat includes repeated sets of times, dates, and measurements. The header text includes the number of sets of measurements,N:

Measurement Data N=3

12:00:00 01-Jan-1977 4.21 6.55 6.78 6.55 9.15 0.35 7.57 NaN 7.92 8.49 7.43 7.06 9.59 9.33 3.92 0.31 09:10:02 23-Aug-1990 2.76 6.94 4.38 1.86 0.46 3.17 NaN 4.89 0.97 9.50 7.65 4.45 8.23 0.34 7.95 6.46 15:03:40 15-Apr-2003 7.09 6.55 9.59 7.51 7.54 1.62 3.40 2.55 NaN 1.19 5.85 5.05 6.79 4.98 2.23 6.99

Opening the File

As with any of the low-level I/O functions, before reading, open the file withfopen, and obtain a file identifier. By default, fopen opens files for read access, with a permission of 'r'.

When you finish processing the file, close it with fclose(_`fid`_).

Describing the Data

Describe the data in the file with format specifiers, such as'%s' for text, '%d' for an integer, or'%f' for a floating-point number. (For a complete list of specifiers, see the fscanf reference page.)

To skip literal characters in the file, include them in the format description. To skip a data field, use an asterisk ('*') in the specifier.

For example, consider the header lines ofmymeas.dat:

Measurement Data % skip the first 2 words, go to next line: %*s %*s\n N=3 % ignore 'N=', read integer: N=%d\n % go to next line: \n 12:00:00 01-Jan-1977 4.21 6.55 6.78 6.55 ...

To read the headers and return the single value forN:

N = fscanf(fid, '%*s %*s\nN=%d\n\n', 1);

Specifying the Number of Values to Read

By default, fscanf reapplies your format description until it cannot match the description to the data, or it reaches the end of the file.

Optionally, specify the number of values to read, so thatfscanf does not attempt to read the entire file. For example, in mymeas.dat, each set of measurements includes a fixed number of rows and columns:

measrows = 4; meascols = 4; meas = fscanf(fid, '%f', [measrows, meascols])';

Creating Variables in the Workspace

There are several ways to store mymeas.dat in the MATLAB workspace. In this case, read the values into a structure. Each element of the structure has three fields: mtime,mdate, and meas.

Note

fscanf fills arrays with numeric values in column order. To make the output array match the orientation of numeric data in a file, transpose the array.

filename = 'mymeas.dat'; measrows = 4; meascols = 4;

% open the file fid = fopen(filename);

% read the file headers, find N (one value) N = fscanf(fid, '%*s %*s\nN=%d\n\n', 1);

% read each set of measurements for n = 1:N mystruct(n).mtime = fscanf(fid, '%s', 1); mystruct(n).mdate = fscanf(fid, '%s', 1);

% fscanf fills the array in column order,
% so transpose the results
mystruct(n).meas  = ...
  fscanf(fid, '%f', [measrows, meascols])';

end

% close the file fclose(fid);

Reading Data Line-by-Line

MATLAB provides two functions that read lines from files and store them as character vectors: fgetl and fgets. Thefgets function copies the line along with the newline character to the output, but fgetl does not.

The following example uses fgetl to read an entire file one line at a time. The function litcount determines whether a given character sequence (literal) appears in each line. If it does, the function prints the entire line preceded by the number of times the literal appears on the line.

function y = litcount(filename, literal) % Count the number of times a given literal appears in each line.

fid = fopen(filename); y = 0; tline = fgetl(fid); while ischar(tline) matches = strfind(tline, literal); num = length(matches); if num > 0 y = y + num; fprintf(1,'%d:%s\n',num,tline); end tline = fgetl(fid); end fclose(fid);

Create an input data file called badpoem:

Oranges and lemons, Pineapples and tea. Orangutans and monkeys, Dragonflys or fleas.

To find out how many times 'an' appears in this file, calllitcount:

This returns:

2: Oranges and lemons, 1: Pineapples and tea. 3: Orangutans and monkeys, ans = 6

Testing for End of File (EOF)

When you read a portion of your data at a time, you can usefeof to check whether you have reached the end of the file.feof returns a value of 1 when the file pointer is at the end of the file. Otherwise, it returns0.

Note

Opening an empty file does not move the file position indicator to the end of the file. Read operations, and thefseek and frewind functions, move the file position indicator.

Testing for EOF with feof

When you use textscan, fscanf, or fread to read portions of data at a time, use feof to check whether you have reached the end of the file.

For example, suppose that the hypothetical file mymeas.dat has the following form, with no information about the number of measurement sets. Read the data into a structure with fields for mtime,mdate, and meas:

12:00:00 01-Jan-1977 4.21 6.55 6.78 6.55 9.15 0.35 7.57 NaN 7.92 8.49 7.43 7.06 9.59 9.33 3.92 0.31 09:10:02 23-Aug-1990 2.76 6.94 4.38 1.86 0.46 3.17 NaN 4.89 0.97 9.50 7.65 4.45 8.23 0.34 7.95 6.46

To read the file:

filename = 'mymeas.dat'; measrows = 4; meascols = 4;

% open the file fid = fopen(filename);

% make sure the file is not empty finfo = dir(filename); fsize = finfo.bytes;

if fsize > 0

% read the file
block = 1;
while ~feof(fid)
    mystruct(block).mtime = fscanf(fid, '%s', 1);
    mystruct(block).mdate = fscanf(fid, '%s', 1);

    % fscanf fills the array in column order,
    % so transpose the results
    mystruct(block).meas  = ...
      fscanf(fid, '%f', [measrows, meascols])';

    block = block + 1;
end

end

% close the file fclose(fid);

Testing for EOF with fgetl and fgets

If you use fgetl or fgets in a control loop,feof is not always the best way to test for end of file. As an alternative, consider checking whether the value thatfgetl or fgets returns is a character vector.

For example, the function litcount described in Reading Data Line-by-Line includes the followingwhile loop and fgetl calls :

y = 0; tline = fgetl(fid); while ischar(tline) matches = strfind(tline, literal); num = length(matches); if num > 0 y = y + num; fprintf(1,'%d:%s\n',num,tline); end tline = fgetl(fid); end

This approach is more robust than testing ~feof(fid) for two reasons:

Opening Files with Different Character Encodings

Encoding schemes support the characters required for particular alphabets, such as those for Japanese or European languages. Common encoding schemes include US-ASCII or UTF-8.

If you do not specify an encoding scheme when opening a file for reading,fopen uses auto character-set detection to determine the encoding. If you do not specify an encoding scheme when opening a file for writing,fopen defaults to using UTF-8 in order to provide interoperability between all platforms and locales without data loss or corruption.

To determine the default, open a file, and call fopen again with the syntax:

[filename, permission, machineformat, encoding] = fopen(fid);

If you specify an encoding scheme when you open a file, the following functions apply that scheme: fscanf, fprintf, fgetl, fgets, fread, and fwrite.

For a complete list of supported encoding schemes, and the syntax for specifying the encoding, see the fopen reference page.