It is currently Mon, 27 Jun 2022 12:35:03 GMT



 
Author Message
 Script to extract portions of text from a text file
2004-12-14, 16:10(-08), KP Bhat:

It's generally a bad idea to write a while read loop. Best is to
use text processing tools to process text. shells are not meant
to be text stream processing tools.

[...]

awk '
  BEGIN {
    beginPattern = ARGV[2]
    endPattern = ARGV[3]
    ARGC = 2
  }
  {
     if ($0 ~ beginPattern)
       doPrint = 1
     else
       if ($0 ~ endPattern)
         doPrint = 0
  }
  doPrint != 0 {print $0}' "$1" "$2" "$3"

Beware that awk (use /usr/xpg4/bin/awk on Solaris)  patterns are
extended regular expressions except with GNU awk for which you
need to pass the -W re-interval or -W posix to have it recognize
the braces.

--
Stephane



 Sun, 03 Jun 2007 16:24:01 GMT   
 Script to extract portions of text from a text file

You could do that in 'sed'.

Nick.
--
http://www.nick-andrew.net/                     http://aus.news-admin.org/
I prefer USENET replies. Don't send email copies. Drop the spamtrap to reply.



 Sun, 03 Jun 2007 16:39:47 GMT   
 Script to extract portions of text from a text file
In article <slrncrvt51.4eq.stephane.chaze...@spam.is.invalid>,
Stephane CHAZELAS  <this.addr...@is.invalid> wrote:

Truer words have never been spoken!

[...]

... etc ...

Better:

    /beginPattern/,/endPattern/



 Sun, 03 Jun 2007 21:39:23 GMT   
 Script to extract portions of text from a text file
2004-12-15, 07:25(-08), dfre...@mtxia.com:
[...]

???

A shell is a tool to run other commands. Those other commands
can be text processing tools, web browsers, text editors...

Additionaly, a shell can take the list of commands from a
file instead of from the user. That's what we call a script.

It's different, perl is intended to be a programming language,
not a shell.

--
Stephane



 Sun, 03 Jun 2007 23:42:06 GMT   
 Script to extract portions of text from a text file

    From its inception, the shell has also been designed as a
    programming language.

    As Stephen Bourne said in his introduction to the shell (1978?),
    "The shell is both a command language and a programming language
    that provides an interface to the UNIX operating system."

    As the Korn shell info page says, "The KornShell language is also
    a complete, powerful, high-level programming language for writing
    applications, often more easily and quickly than with other
    high-level languages."

    As the bash manual states, "A Unix shell is both a command
    interpreter, which provides the user interface to the rich set of
    GNU utilities, and a programming language, allowing these
    utilities to be combined."

    The shell is a very powerful programming language. It has its
    quirks, and there are places where it is not the best language to
    use (when dealing with binary data, for example). For some tasks
    it is too slow (e.g., very large files). But for a very wide range
    of applications is is more than adequate, and often the most
    efficient language for the task at hand.

--
    Chris F.A. Johnson                  http://cfaj.freeshell.org/shell
    ===================================================================
    My code (if any) in this post is copyright 2004, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License



 Mon, 04 Jun 2007 00:28:42 GMT   
 Script to extract portions of text from a text file
2004-12-15, 08:04(-08), dfre...@mtxia.com:
[...]

I call sed, awk, cut text processing programs. I call a shell a
command line interpreter, a shell.

There are many differences, but that was not my point.

The processing done by the shell is different wether the input
is a file a tty or neither of them and depending on the
interactive mode being on or off.

Among the differences are how much is read at a time, the alias,
history expansion, the key handling, how some special builtins
behave...

[...]
[...]

The reason why one shouldn't use "while read" loop is because
shells were not designed to do that, while tools like awk are
designed to do that.

Look at what ugly things ksh93 had to come to to work around the
fact that it is a shell in order to pretend being a programming
language (reading chunks and lseeking back, random optimizations
changing even the meaning of things, building in all the
utilities)...

If read was supposed to be used that way, it wouldn't strip
backslashes and leading/trailing blanks by default. Because it's
a shell, it has to read one character at a time.

--
Stephane



 Mon, 04 Jun 2007 00:42:23 GMT   
 Script to extract portions of text from a text file
In article <32b71qF3ko4j...@individual.net>,
Chris F.A. Johnson <cfajohn...@gmail.com> wrote:

Stephane is right about shells and their place in the scheme of things.
The rest of you are just blowing smoke.

And the documentation for Microsoft Windows says things like "Windows
version XYZ is a modern, stable, secure, useful operating system."

Saying stuff don't make it so.  The world would be a much nicer place if it
did...



 Mon, 04 Jun 2007 00:57:59 GMT   
 Script to extract portions of text from a text file

    There are thousands of shell applications that prove otherwise.

    No, but as soon as I find a match and light my pipe I will be. ;)

--
    Chris F.A. Johnson                  http://cfaj.freeshell.org/shell
    ===================================================================
    My code (if any) in this post is copyright 2004, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License



 Mon, 04 Jun 2007 01:21:45 GMT   
 Script to extract portions of text from a text file

    Of course shells were designed to do that. The only difference
    between a shell and awk is that it is explicit (and therefore
    slower) in the shell.

    On the contrary; stripping leading and trailing whitespace is a
    useful feature that can save a lot of work. On older systems there
    was a 'line' command if you needed absolutely everything on the
    line. It has been superseded by 'IFS= read -r'.

    The ability to continue a line by ending it with a backslash
    simplifies writing data files that would otherwise require very
    long lines.

    That looks like a _non sequitur_.

--
    Chris F.A. Johnson                  http://cfaj.freeshell.org/shell
    ===================================================================
    My code (if any) in this post is copyright 2004, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License



 Mon, 04 Jun 2007 01:34:14 GMT   
 Script to extract portions of text from a text file
2004-12-15, 09:18(-08), dfre...@mtxia.com:
[...]

Some examples:

set -o emacs
echi^Ho foo

(where ^H is a BS character) is read differently from the tty
and from a script

exec 3> /
behaves differently in a script and at the prompt

Same for

echo > *

[...]

I don't follow, could you please come with an example?

[...]
[...]

cmd1 | while IFS= read -r line
  expr -- "$var" : ...
done

"read" has to read one character at a time until it finds a \n ,
because otherwise, expr would not be able to get the second
line of input (if it wanted to. It doesn't want but read
does not know that).

Moreover, the fact that the above code is illegible is a strong
indication that I'm not doing it the right way.

The right way being:

cmd1 | awk '{ print substr($1, ...) }'

awk reads a whole buffer at a time, does the processing
internally.

--
Stephane



 Mon, 04 Jun 2007 01:41:31 GMT   
 Script to extract portions of text from a text file
2004-12-15, 17:34(+00), Chris F.A. Johnson:
[...]

The obvious way, if read was meant to read input for text
processing would have been:

read line # read a line without extra processing
read -s line # read line AND strip spaces (but best would have
             # been to provide an additional command/operator to
             # do so.
read -l line # read logical line continued with backslashes.

But it's broken. It also strips backslashes in the middle of the
line, while the obvious way is to take only the last one
specially as is done in most other tools or as in most text
formats that follow that convention (which means you can't
process those formats (C, awk, rc...) with read.

And that feature is not useful for text processing but for
reading from the user.

--
Stephane



 Mon, 04 Jun 2007 02:04:23 GMT   
 Script to extract portions of text from a text file

Just because you can't afford Windows and quality hardware to run it on,
that doesn't mean Microsoft has to spoon feed you and wipe your arse.  
Lay off on Windows.  It's good for what it was designed for.  

Just because you can't program in shell, that doesn't mean Bash isn't
good for programming.



 Mon, 04 Jun 2007 02:19:55 GMT   
 Script to extract portions of text from a text file

    That's what the "line" command was for.

   It is not broken; it's behaviour is just not what YOU want. I have
   no problem with it.

    Most text files do NOT have baskslashes; that's why it was chosen
    as the escape character. There are specialized text formats that
    do, and they must be handled differently.

    In my experience, it is exactly the opposite.

    I have never used it for reading from the user; I have used it
    many times for reading from a file.

--
    Chris F.A. Johnson                  http://cfaj.freeshell.org/shell
    ===================================================================
    My code (if any) in this post is copyright 2004, Chris F.A. Johnson
    and may be copied under the terms of the GNU General Public License



 Mon, 04 Jun 2007 02:22:00 GMT   
 Script to extract portions of text from a text file

But your example is not comparing the same command.  At the tty the ^H
is interpreted by the tty driver, not the shell, whereas if you imbed a
literal ^H into a command in a script, it fails because the command
"echi^Ho" does not exist.  In fact if you escape the ^H at the command
line, it is processed exactly the same as it would be in a script, with
an embedded ^H they both fail.

I saw no difference in behavior between the command line and script in
ksh93.  Others should respond regarding other shells.

I'll concede this one, however it is an extremely unlikely this command
would actually be used from the command line or from a script.

# explicit "while read" in shell followed
# by an implied "while read" in awk
while IFS=":" read U J J J J H J
do
print "${U} ${H}"
done < /etc/passwd  | awk '{ print $2 " is the home dir of " $1 }'

How is the shell using a \n to designate the end of a line different
from awk using the same character to designate the end of a line?

Most code is illegible if you don't know the syntax, that is why it's
called code.

Why is this code more legible than the previous code?

The shell generated the buffer for awk to read from.  Sounds like an
extra step to me.

The original argument was that a "while read" loop is a bad idea, as
though there was some fundamental problem with it.  If you are
processing large text files, it is generally faster to use "awk", but
you have not presented any reason why performing a shell "while read"
is a bad idea.

--
Dana French



 Mon, 04 Jun 2007 03:27:16 GMT   
 Script to extract portions of text from a text file
I had a need to extract portions of text (delimited by fixed patterns)
from a large text file.  Here's a shell script that I wrote for this
purpose.  Kindly suggest a more "elegant" way to do the same:

Thanks,
Bhat

#!/bin/ksh
# Script to extract portions of text from a text file
#
#
#set -x

if [ "$#" -lt "3" ]
then
echo "usage $0: <input-file> <begin-pattern> <end-pattern>"
exit 1
fi

if [ ! -f "$1" ]
then
echo "File $1 does not exist"
exit 2
fi

if [ ! -r "$1" ]
then
echo "Error reading file $1"
exit 3
fi

printFlag=false

function printLines
{
if [ "$printFlag" == "true" ]
then
echo "$*"
fi

while read line
do
val=`echo $line | grep "$2"`
if [ "$val" != "" ]
then
printFlag=true
else
val=`echo $line | grep "$3"`
if [ "$val" != "" ]
then
printFlag=false
echo "\n\n\n"
    fi
  fi
  printLines "$line"
done < $1

exit 0



 Sun, 03 Jun 2007 08:10:04 GMT   
 
   [ 24 post ]  Go to page: [1] [2]

Similar Threads

1. Extracting lines from a text file that match a certain criteria to another text file

2. Urgent, need script or command to extract lines from a huge text file

3. Extracting text from a file in a shell script

4. Q: How to remove portion from a text file

5. Copy files using filenames from text files with shell script or bash script

6. Dos Text Files to Linux Text Files

7. conver dos text file to unix text file

8. Extracting text from a file

9. Postscript file corrupted - extracting text/patching

10. How to extract columns from a text file?


 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software