It is currently Tue, 30 Nov 2021 16:50:14 GMT



 
Author Message
 splitting a very large file based on characters in a record (performance)
I have a file with a large number of records (from 0.5 million to 1
million records). Each record is of length 250 characters and does not
have delimiters. I have to split each record into 2 separate records
(each of 125 characters), and further I have to split the 125
characters into different fields and based on the value of a field I
have to write it to 2 different output fields.

I would like to know if :
1. For a file of such huge size, should I split each record first into
2 and then work on it OR should I treat the whole 250 char record as
one and split it into fields as it is?? Performance is important here

2. Is it preferable to use ksh/awk over perl here?

3. Is it ok to use the "cut" command here?

Regards,
Billy



 Mon, 07 Nov 2005 22:55:58 GMT   
 splitting a very large file based on characters in a record (performance)

i honestly don't think you'll see a performance difference unless your
machine is so slow as to make everything noticably slow (e.g., my laptop,
where the HDD has no cache).

Eeeeeekkkk!! ;)
My reasoning: Perl can do everything awk can do, plus everything ksh can
non-interactively do, so my choice would be perl.

Should be no reason not to, but if you decide to use awk or perl you
probably don't need cut.

--
----- stephan beal
Registered Linux User #71917 http://counter.li.org
I speak for myself, not my employer. Contents may
be hot. Slippery when wet. Reading disclaimers makes
you go blind. Writing them is worse. You have been Warned.



 Mon, 07 Nov 2005 23:11:13 GMT   
 splitting a very large file based on characters in a record (performance)

news:a8eb837.0305220655.5dc41f71@posting.google.com...

There's unlikely to be a significant difference.  The only way to be sure is
to write a benchmarking script -- but the time it takes you to write that
would probably be greater than the time saved by employing the results
thereof.

Well, if you cross-post to 3 different lists, you're likely to get 3
different answers.  But since Perl is more flexible than shell and awk, try
it in Perl.



 Tue, 08 Nov 2005 07:49:59 GMT   
 splitting a very large file based on characters in a record (performance)
In article <a8eb837.0305220655.5dc41...@posting.google.com>,

X I have a file with a large number of records (from 0.5 million to 1
X million records). Each record is of length 250 characters and does not
X have delimiters. I have to split each record into 2 separate records
X (each of 125 characters), and further I have to split the 125
X characters into different fields and based on the value of a field I
X have to write it to 2 different output fields.
X
X I would like to know if :
X 1. For a file of such huge size, should I split each record first into
X 2 and then work on it OR should I treat the whole 250 char record as
X one and split it into fields as it is?? Performance is important here

One pass is going to be faster than reading the data twice and writing
it twice.  The in memory work is trivial (performance wise) compared to
reading and writing the data.

X 2. Is it preferable to use ksh/awk over perl here?

While I use awk for most of my text manipulation work, I would say from
personal experience perl will be much faster than awk.  My example is
that I had an awk script and a perl script that read sendmail files to
generate a mail summary of who sent it and the subject line.  The perl
script was about twice as fast.  So while I still use awk for quick and
dirty throw away scripts and for scripts where lots of data is not going
to be processed, I will use perl if the amount of data to be processed
is large and I will need to do it more than once of twice.

X 3. Is it ok to use the "cut" command here?

Without seeing the data and the rules for separating the data, it is
difficult to say if cut would be a better choice.

X Regards,
X Billy



 Tue, 08 Nov 2005 11:00:44 GMT   
 
   [ 4 post ] 

Similar Threads

1. Large Text File Split based on a field value

2. Split input file based on size of file

3. Help:how do you split a large file into several small files and

4. Large scale kernel performance recording

5. splitting large files?

6. How to split large files?

7. Need to make large or split tar files

8. Splitting large OGG or WAV files

9. Need to split large file on multiple floppies for distribution

10. Splitting large file


 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software