It is currently Tue, 07 Dec 2021 03:19:07 GMT



 
Author Message
 Grabbing a small piece from a huge file
Let's say I receive a file that looks like the following every day.  It's a
huge file that loops numbers 1 - 5 over and over again.  Here's a small piece
of it below:

1.  Vendor: Alpo Incorporated
2.  Date:  4/22/99
3.  PO number:  TEST
4.  Product bought:  2 bags of Alpo
5.  Purchaser:  Richard Jackson
1.  Vendor:  King Foods
2.  Date:  4/22/99
3.  PO number:  TEST2
4.  Product bought: 3 packs Potato Chips
5.  Purchaser:  Joe Richardson

Please remember that normally this file is huge.  Now, let's say someone asks
me about po number TEST2.  I need to be able to search this huge file for the
po number, but also grab numbers 1 - 5 (Vendor - Purchaser) that are associated
with that PO number and copy it out of this huge file, putting it in a separate
file by itself.  How can I copy that small piece away from my huge file?



 Wed, 10 Oct 2001 03:00:00 GMT   
 Grabbing a small piece from a huge file
In article <19990424095317.24951.00000...@ng-fu1.aol.com>,

If you have gnu grep try:

        grep --before-context=2 --after-context=2 TEST2 filename

Al

--
#  Al Bolduc - abol...@mediaone.net - ka...@amsat.org - ka...@arrl.net



 Wed, 10 Oct 2001 03:00:00 GMT   
 Grabbing a small piece from a huge file
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

    OhOhTrubba> Let's say I receive a file that looks like the
    OhOhTrubba> following every day.  It's a huge file that loops
    OhOhTrubba> numbers 1 - 5 over and over again.  Here's a small
    OhOhTrubba> piece of it below:

    OhOhTrubba> 1.  Vendor: Alpo Incorporated 2.  Date: 4/22/99 3.  PO
    OhOhTrubba> number: TEST 4.  Product bought: 2 bags of Alpo 5.
    OhOhTrubba> Purchaser: Richard Jackson 1.  Vendor: King Foods 2.
    OhOhTrubba> Date: 4/22/99 3.  PO number: TEST2 4.  Product bought:
    OhOhTrubba> 3 packs Potato Chips 5.  Purchaser: Joe Richardson

    OhOhTrubba> Please remember that normally this file is huge.  Now,
    OhOhTrubba> let's say someone asks me about po number TEST2.  I
    OhOhTrubba> need to be able to search this huge file for the po
    OhOhTrubba> number, but also grab numbers 1 - 5 (Vendor -
    OhOhTrubba> Purchaser) that are associated with that PO number and
    OhOhTrubba> copy it out of this huge file, putting it in a
    OhOhTrubba> separate file by itself.  How can I copy that small
    OhOhTrubba> piece away from my huge file?

grep -5 TEST2 test.dat | grep -A 5 '^1. '

mp

- --
                                      powered by GNU/linux since Sept 1997
           mich...@trollope.org    http://www.trollope.org
Michael Powe                                          Portland, Oregon USA
  "Would John the Baptist have lost his head if his name was Steve?"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v0.9.0 (GNU/Linux)
Comment: Encrypted with Mailcrypt 3.5.1 and GNU Privacy Guard

iD8DBQE3IeSu755rgEMD+T8RAn7aAKCtXKZkC3sGJgc/agA/667MUV0cEQCgsAsd
RcSLcGd6M2ILIiGsnbIXxw0=
=8SDO
-----END PGP SIGNATURE-----



 Wed, 10 Oct 2001 03:00:00 GMT   
 Grabbing a small piece from a huge file

Use grep.  Assuming that the line numbers are in fact part of the file, here
is a small shell script that will accomplish it:

  #!/bin/sh
  #
  # ponumber  -- select five lines associated with the PO number
  #              designated in arg1 from file designated in arg2
  #

  if [ "${2}x" = "x" ]
  then
     echo "Usage:  `basename ${0}` number datafile"
     exit 1
  fi

  grep -A2 -B2 "^3.  PO number:  ${1}$" ${2}

  # end of script

  Floyd

--
Floyd L. Davidson                                fl...@ptialaska.net
Ukpeagvik (Barrow, Alaska)                       fl...@barrow.com
     North Slope images: <http://www.ptialaska.net/~floyd>



 Wed, 10 Oct 2001 03:00:00 GMT   
 Grabbing a small piece from a huge file
In article <19990424095317.24951.00000...@ng-fu1.aol.com> in
comp.unix.programmer of Sat, 24 Apr 1999:13:53:17 , OhOhTrubba
<ohohtru...@aol.com> writes

I apologise if I miss the point. I saw various suggestions advocating
non-standard extensions to grep. If you have them, why not?
I can't see what is wrong with:
ponumber='PO number:  TEST2'
sed -n "N;N;N;N;/\n$ponumber\n{p;q" -e '}' hugefile

The obvious difficulty with that is that it relies on exact formatting
of the data. I would be inclined to convert the input file to a single
case file with each record consisting of a single line when the data
comes in. e.g. something like
sed -e '\
/^Vendor: [A-Z]/!{=;s/$/ expecting Vendor line/p
s/......//;N
# Similar treatment for other 4 lines
# reduce 5 lines to 1
s/\n/ /g
# Single case the line
y /ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
w outputfile' huge file

etc.

Otherwise, store your information in a database table indexed on the PO
number. That will give you search times which grow proportionate to the
logarithm of the size of data. grep and sed search times grow
proportionately with the size of the data.
Otherwise, a programming language and a book of algorithms should give
you what you want. I don't know enough from your posting to analyse your
requirement. You may want to buy a solution if the simple things I
suggest do not prove adequate.
--
Walter Briscoe



 Thu, 11 Oct 2001 03:00:00 GMT   
 Grabbing a small piece from a huge file

If you have an Oracle Database server on site then I would suggest loading
all this information into a Database daily. You can write a small Pro*C
program to handle the load. Once your data is in Oracle, you can access the
information using SQLPlus.

Good Luck,

<19990424095317.24951.00000...@ng-fu1.aol.com>...



 Thu, 11 Oct 2001 03:00:00 GMT   
 
   [ 6 post ] 

Similar Threads

1. grabbing a piece of file

2. how to split a very huge text file into small files

3. how to create large filesystem with a huge number of small files

4. Huge (I mean HUGE) log files vs. performance

5. Very Small Advocacy Piece...

6. Grabbing file from file descriptors


 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software