It is currently Thu, 20 Jan 2022 05:24:57 GMT



 
Author Message
 sed expression for truncating line length
Hello, regex[p] and sed gurus.....

I thought I was becoming knowledgeable on that
area, but I'm running into a few problems.  I'm
doing a recursive diff on 2 directories, and I wanted
to truncate the output to (say) 60 characters.  In
troubleshooting my sed expression, I confirmed that
it follows regexp(5) rules, which I am using.  Here
is my command:

    diff -r B noB | sed -e 's+^.\(\{60\}\).*+\1+' | less

The round brackets atomicize the first 60 letters
in the line, and the substitution replaces all the
characters in the line with the 1st 60 characters.

But the pattern doesn't match anything.  If I leave
out the round brackets, it matches, but I can't
refer to the 1st 60 characters for the substitution.

Why doesn't the expression match?  Thanks.

Fred

P.S.  I realize that there are probably commands
to trucate the line, but I can't find it using apropos,
and with my newfound sed knowledge, I ideally
shouldn't have to remember all sorts of rarely
used commands.  Sed should be enough, and I
use it often enough that I won't forget.

--
Fred Ma, f...@doe.carleton.ca
Carleton University, Dept. of Electronics
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6



 Sat, 09 Jul 2005 13:00:29 GMT   
 sed expression for truncating line length
Shing-Fat Fred Ma <f...@doe.carleton.ca> writes:

Put the "." inside the parentheses, like this:

sed -e 's+^\(.\{60\}\).*+\1+'

BTW, you might consider using a character other than "+" for your
delimiter.  Some regular expression libraries use "+" to indicate
"one or more," and using "+" as the delimiter could cause an error.
This may not be a problem on your system, but it could become a
problem if you port your script to another system.  A common delimiter
that shouldn't cause problems is the forward slash (/):

sed -e 's/^\(.\{60\}\).*/\1/'

I'd use "cut -c1-60", which is simpler and easier to understand.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/



 Sat, 09 Jul 2005 14:06:07 GMT   
 sed expression for truncating line length

You are right - the better command (much lighter) is cut.  Try
$ diff -r B noB | cut -c1-60| less

Yuan Liu



 Sat, 09 Jul 2005 15:28:55 GMT   
 sed expression for truncating line length
In message <3E2CF82B.9030...@stemnet.nf.ca.remove_this> of Tue, 21 Jan
2003 07:28:55 in comp.unix.questions, Dr. Yuan Liu <y...@stemnet.nf.ca.r
emove_this> writes
It depends what you mean by "better". I think it is arguable that sed is
better because it is a general purpose tool and increasing the number of
tools one uses is "worse". Anyway, I will get back to Fred's problem.

In message <3E2CD3EA.5090...@doe.carleton.ca> of Tue, 21 Jan 2003
05:00:29 in comp.unix.questions, Shing-Fat Fred Ma
<f...@doe.carleton.ca> writes
[snip]

When developing a pipeline like this, it is better to test each
component in turn when a problem arises. The following is done in a
Windows cmd.exe "shell".
C:\WINNT\Temp\sedread> nl data
     1   2345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345678 6 2345678 7
     2   2345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345678 6
     3   2345678 1 2345678 2 2345678 3 2345678 4 2345678 5 23456789

C:\WINNT\Temp\sedread> sed -e "s+^.\(\{60\}\).*+\1+" < data
sed: garbled command s+^.\(\{60\}\).*+\1+

C:\WINNT\Temp\sedread> :: I am surprised Fred got no error message.
C:\WINNT\Temp\sedread> :: I interpret his RE as
C:\WINNT\Temp\sedread> :: ^    start of line
C:\WINNT\Temp\sedread> :: .    Any character other than \n
C:\WINNT\Temp\sedread> :: \(   Start remembering
C:\WINNT\Temp\sedread> :: \{   Count previous token
C:\WINNT\Temp\sedread> :: Error Count only applies to characters
C:\WINNT\Temp\sedread>
C:\WINNT\Temp\sedread> :: The following works for me
C:\WINNT\Temp\sedread> sed "s/\(.\{60\}\).*/\1/" < data
 2345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345678 6
 2345678 1 2345678 2 2345678 3 2345678 4 2345678 5 2345678 6
 2345678 1 2345678 2 2345678 3 2345678 4 2345678 5 23456789

C:\WINNT\Temp\sedread>

A relevant extract from IEEE Std 1003.1-2001 is
When a BRE matching a single character, a subexpression, or a back-
reference is followed by an interval expression of the format "\{m\}" ,
"\{m,\}" , or "\{m,n\}" , together with that interval expression it
shall match what repeated consecutive occurrences of the BRE would
match.
Another relevant extract is
A subexpression can be defined within a BRE by enclosing it between the
character pairs "\(" and "\)" . Such a subexpression shall match
whatever it would have matched without the "\(" and "\)" , except that
anchoring within subexpressions is optional behavior

A quick reading of the BNF presentation of the BRE syntax suggests to me
that Fred's syntax is wrong. I do not feel like making the effort to
prove it :)
--
Walter Briscoe



 Sat, 09 Jul 2005 19:01:45 GMT   
 sed expression for truncating line length
  Thanks, everyone.  I mean more for
your professionalism in responding to
what was a syntax error.  I'd say
something self-depreciating, but
been doing that a bit much lately.  Bad
for esteem, etc..

And no, I didn't get a syntax error.
I think because the \{60\} was
referring to the last matched
character, which is whatever the
"." matched at the start of the line.
At least on solaris 8.

The cut command is handy.  I'll
keep it in mind.

Fred

--
Fred Ma, f...@doe.carleton.ca
Carleton University, Dept. of Electronics
1125 Colonel By Drive, Ottawa, Ontario
Canada, K1S 5B6



 Mon, 11 Jul 2005 05:24:13 GMT   
 
   [ 5 post ] 

Similar Threads

1. how to truncate lines to a specific length?

2. sed truncate end of line

3. SED question (1st line that matches expression)

4. sed : an expression for the last but one line

5. sed - same length for all lines

6. SED & line lengths

7. a SED script to find length of the longest line in a file

8. sed Max line length?

9. Display all lines from line which contains some expression


 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software