Friday 7 December 2012

Matlab with large input data

Yesterday, I was dealing with a big file (about 600MB). At first, I tried to use "textread" to input it to matlab. Though my computer equiped with 8GB physical memory, the matlab finally ended in a long list of error.

Then, I have to use the a much more slower method, read line by line. "fgetl" and "sread", the read the string and the second parse the string according to the given format, just like scanf in C. However, the final program run very slow, about 1440s to finish.

At last, I fould another function 'textscan', which is very similar to textread, but more powerful. It can be use

C = textscan(fid, 'format', N) 

It can read the data part by part of a scale in N, which strikes a better balance between the textread and fgetl. In this way, the operation is paralleled to some extent, so the speed would be faster.

By the way, the awk is a poweful tool to deal with text.