It is currently Thu, 20 Jan 2022 10:35:57 GMT

Author Message
 File parsing

I have quick request.. I have a large file about 150MB in size which
needs to be parsed and put into seperate files based on a field. The
file could look like this:

f1 f2 f3 f4
0  aa bb cc
3  ax bb cc
1  ab ab aj
2  ab ab aj
3  db dc dd

Here based on the field f3 the data should go into seperate files
namely file_bb.txt, file_ab.txt, file_dc.txt and so on.

I was working on a 'while read' construct with the 'cut' command.

Any pointers.....


 Mon, 11 Sep 2006 02:58:06 GMT   
 File parsing
In article <>,

use something like this (untested):

gawk 'NR>1 {close (outfile)
            outfile="file_" ind ".txt"
            print >> outfile}' infile

Chuck Demas

  Eat Healthy        |   _ _   | Nothing would be done at all,
  Stay Fit           |   @ @   | If a man waited to do it so well,
  Die Anyway         |    v    | That no one could find fault with it. |  \___/  |

 Mon, 11 Sep 2006 03:08:45 GMT   
 File parsing

Assuming the first line of heading above is just for the benefit of this
posting and isn't really present in the data file (if it is just prefix
with "NR>1") and that by "file_" you really mean "the input file name
folled by underscore":

        awk '{print > FILENAME "_" $3 ".txt"}' file



- Show quoted text -

 Mon, 11 Sep 2006 03:41:20 GMT   
 File parsing

One possibility:

while read line;do
  set -- $line
  printf "%s\n" "$line" >> file_${3}.txt
done < dat

The Dutchman still wears wooden shoes, his cap and coat are patched
with the love that Magaret sewed there...
  - Steve Goodman

 Mon, 11 Sep 2006 03:49:59 GMT   
 File parsing
2004-03-24, 19:49(+00), j...@invalid.address:

#! /usr/bin/env bash

most UNIX don't have bash in /bin, many don't even have bash,
even some Linux systems only have bash in /usr/bin.

while IFS= read -r line; do

read alone is to read from the user, it performs additional
processing not suitable for text processing.

You need first to disable filename generation if you want to use
variables unquoted for word splitting (put "set -f" at the
beginning of the script).

   printf "%s\n" "$line" >> "file_${3}.txt"

It's always safer to quote variables (even if in that case, once
you've disabled filename generation, it shouldn't be a problem)

Note that the awk solution would be many times more efficient
and is the most obvious way to do it.

awk '{print > "file_" $3 ".txt"}' < dat

(note that at least GNU awk closes files when needed (and reopen
them with O_APPEND), so you won't get the "too many open files"
error. If your awk is not the GNU awk and you see the "too many
opened files" error, you'll need:
awk '{o="file_" $3 ".txt"; print >> o; close(o)}' < dat

Stphane                      ["Stephane.Chazelas" at ""]

 Mon, 11 Sep 2006 04:54:56 GMT   
 File parsing

If you don't have GNU awk, then you would be better off sorting the file
on the third field, after possibly adding a 'line number' in order to be
able to recover the original order of the data, and then only having one
file open at a time. You may even be better off by writing a program to
write a program which makes multiple passes over the datafile, pulling out
as much data as it can each time.

 Mon, 11 Sep 2006 05:12:36 GMT   
   [ 6 post ] 

Similar Threads

1. modutils-1.2.8 configuration file parsing not working as documented

2. /etc/passwd file parse validity checker Help.

3. stanza files parsing library

4. More File Parsing

5. Newbie: Text data file parsing problem

6. file parsing

7. Need help with File parsing

8. Automatic File Parsing

9. csh file parsing

10. PHP3 files not parsed- displayed as .txt file.

Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software