It is currently Tue, 07 Dec 2021 01:37:29 GMT



 
Author Message
 sed on LONG lines

I have a large number of data files that have lost their newlines.  
The result is files between 200K and 1M that are all one line.  There
is a distinctive character pattern ("015     ") that is supposed to
start each line, however I haven't been able to get sed to add a
newline before the 015.  I've tried various permutations, but at best
I get an error message stating "Output line too long".  Any
suggestions as to how to do this?  Is it possible to do this with sed?  
If not, any ideas about what can handle it?

Thanks.

Steve



 Wed, 14 Feb 2001 03:00:00 GMT   
 sed on LONG lines
| I have a large number of data files that have lost their newlines.  
| The result is files between 200K and 1M that are all one line.  There
| is a distinctive character pattern ("015     ") that is supposed to
| start each line, however I haven't been able to get sed to add a
| newline before the 015.  I've tried various permutations, but at best
| I get an error message stating "Output line too long".  Any
| suggestions as to how to do this?  Is it possible to do this with sed?  
| If not, any ideas about what can handle it?

That 015 could actually be a single character - octal for the CR
carriage return character, so many applications will display it
as "\015" or something like that.

In this case, "tr '\r' '\n' < bad > good" should solve yer problem
in a jiffy.

In general, sed's not so good for this.  I'd probably write a C
program, or whatever language you have at your disposal that can
read characters of data, rather than lines - most general purpose
languages have no problem with this.  I'll append a Python program
for the algorithm.

        Donn Cave, University Computing Services, University of Washington
        d...@u.washington.edu
-----------------------------
#  http://www.python.org/
import string
import sys

head = ''
while 1:
        block = sys.stdin.read(8192)
        if not block:
                break
        lines = string.split(head + block, '015')
        for line in lines[:-1]:
                sys.stdout.write(line + '\n')
        head = lines[-1]

sys.stdout.write(head)



 Wed, 14 Feb 2001 03:00:00 GMT   
 sed on LONG lines

Actually the 015 isn't part of an octal character.  The data format
for this particular filetype calls for each line to begin with the 8
character string "015     ".  I was just trying to use that as a way
to locate where the newlines go.  I guess I'll have to break down and
write a little code (I don't use python) unless somebody else knows a
quicker way.  Thanks.

Steve



 Wed, 14 Feb 2001 03:00:00 GMT   
 sed on LONG lines
In article <6s7tde$...@news.Hawaii.Edu>,

try using gawk:

gawk 'BEGIN{RS="015     ";ORS="\n"}{print }' infile > outfile

or

gawk 'BEGIN{RS="015     ";ORS="\n015     "}{print }' infile > outfile

if you want to save the "015     "

man gawk

use gawk rather than awk, it will handle things better.

Chuck Demas
Needham, Mass.

--
  Eat Healthy    |   _ _   | Nothing would be done at all,
  Stay Fit       |   @ @   | If a man waited to do it so well,
  Die Anyway     |    v    | That no one could find fault with it.
  de...@tiac.net |  \___/  | http://www.tiac.net/users/demas



 Wed, 14 Feb 2001 03:00:00 GMT   
 sed on LONG lines
On 29 Aug 1998 00:33:54 GMT, <ssw...@soest.ignorethispart.hawaii.edu> wrote:

Well, as long as you have enough virtual memory on your system,
GNU sed (3.02) should handle it fine with:
sed 's/015     /\
/g
$a\
'

You can use perl to process one line at a time and avoid slurping
the whole file into memory:
  perl -l012 -pe 'BEGIN{$/="015     "}'

Or, for good measure, (mildly) obfuscated C ("cb" should de-obfuscate):
#include <stdio.h>
const char match[]="015     ";int main(void){const char*s=match;int c;
while((c=getchar())!=EOF){if(c==*s){if(!*++s){putchar('\n');s=match;}}
else{if(match<s){fwrite(match,s-match,1,stdout);s=match;}if(c==*s)++s;
else putchar(c);}}putchar('\n');return 0;}

                --Ken Pizzini



 Thu, 15 Feb 2001 03:00:00 GMT   
 sed on LONG lines

On 1998-08-29 ssw...@soest.ignorethispart.hawaii.edu said:
   >I have a large number of data files that have lost their newlines.
   >The result is files between 200K and 1M that are all one line.
   >There is a distinctive character pattern ("015     ") that is
   >supposed to start each line, however I haven't been able to get sed
   >to add a newline before the 015.  I've tried various permutations,
   >but at best I get an error message stating "Output line too long".
   >Any suggestions as to how to do this?  Is it possible to do this
   >with sed? If not, any ideas about what can handle it?
   >Thanks.
   >Steve
Do you mean the actual characters "015", or a single CR character represented
as \015?  If it's the latter, you could use dos2unix or "tr \015 \012"
If it's the former,

sed 's/015    /
015    /g'

might do it if you don't get an "Input line too long" error.

Net-Tamer V 1.08X - Test Drive



 Fri, 16 Feb 2001 03:00:00 GMT   
 sed on LONG lines
: Is it possible to do this with sed?

        Not likely.  Sed really doesn't handle arbitrary data well at
        all.

: If not, any ideas about what can handle it?

        Install perl if you don't have it already.  Then do this:

        perl -pi.bak -e 's/(015     )/\n$1/g' filename

        This will modify the original file, leaving filename.bak as
        the original backup copy.  Perl won't care if your files are
        200 Gigs, so long as you have the memory.
--
-Zenin (ze...@archive.rhps.org)           From The Blue Camel we learn:
BSD:  A psychoactive drug, popular in the 80s, probably developed at UC
Berkeley or thereabouts.  Similar in many ways to the prescription-only
medication called "System V", but infinitely more useful. (Or, at least,
more fun.)  The full chemical name is "Berkeley Standard Distribution".



 Fri, 16 Feb 2001 03:00:00 GMT   
 sed on LONG lines
mmay be
bed (binary editor) can help
bed, sed, etc
are ud (unixdos) packages by the same author
ud16 is 16 bit
he has a package for 32 bits

see below

 ----------------------------------
to  master      regular  expressions
to  master      sed
join my         seders  informal  email  list
                        af...@torfree.net
 ----------------------------------

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                           SED RESOURCES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

to  master      regular  expressions        
to  master      sed                          
join my         seders  informal  email  list :
                        af...@torfree.net
                                ----

                            sed web pages

                        "sed one liners"    by Eric Pement
                        "Doing It with sed" by sed stud Carlos
http://www.**-**.com/ ;seders grab bag (seders official web page)
http://www.**-**.com/ ~george/#seders  seder, engineer, Dr2b Yiorgos

                             THE SED FAQ                

latest version of the sed FAQ is usually at:            
    http://www.**-**.com/
    http://www.**-**.com/ ~george/sed/sedfaq.html  
    http://www.**-**.com/ ;                
    http://www.**-**.com/ ;      

                                ----

http://www.**-**.com/ ~guckes/sed/                 seder Herr Guckes
http://www.**-**.com/ ~leitner/sed/                      Herr Leitner
http://www.**-**.com/ , seder Simon Taylor
http://www.**-**.com/ ~gis84806/sed {*filter*} scripts.seder Yao-Jen Chang

                  sed/regular expressions tutorials/refs

1- http://www.**-**.com/ :457/OSUserG/BOOKCHAPTER-14.html
                                     Chapter 14, Manipulating text with sed
2- "Doing It with sed" by sed stud Carlos (see http's above)

3- SunOS Manual Pages
        http://www.**-**.com/
        http://www.**-**.com/

4- u-sedit2. has nice sed docs.
http://www.**-**.com/

5- dc UNIX stack-calculator, in sed, by sed stud  GREG UBBEN
        http://www.**-**.com/ ~george/sed/dc.sed.html

6- sierpinski gasket/triangle, in sed, by sed stud KEN PIZZINI
        posted by Al Aab, in July 1998 to
        alt.comp.editors.batch  &  comp.unix.shell
        search dejanews    

                                ----

                        some sed implementaions

"official" release of GNU sed-3.01 is finally available from:
ftp://ftp.gnu.org/pub/gnu/sed-3.01.tar.gz                      

sedmod.zip   very extended/awkish   DOS    sed
ftp://ftp.adam.anet.cz/pub/cdrom3/fileutil/sedmod.zip
http://www.**-**.com/

MKS Toolkit, windows 32 http://www.**-**.com/

ftp://uiarchive.uiuc.edu/pub/systems/pc/simtelnet/win95/util/                  
ud32_v43.zip B  3381699  980512  UnixDos: Full Unix set: 65 progs +28 new
utils

ftp://uiarchive.uiuc.edu/pub/systems/pc/simtelnet/win3/util/                  
qud16_v42.zip B  4767550  980313  UnixDos: Full Unix set. 64 progs +28 new
utils

                                ----
                       my favourite DOS/UNIX sed :

 ftp://uiarchive.uiuc.edu/pub/systems/pc/simtelnet/msdos/txtutl/sed15.zip
 ftp://uiarchive.uiuc.edu/pub/systems/pc/simtelnet/msdos/txtutl/sed15x...

Directory: /pub/systems/pc/simtelnet/msdos/txtutl/
Filename    Type Length  Date    Description
sed15.zip    B    62082  910930  Unix-compatible streaming editor v1.5 TC src
sed15x.zip   B    20300  910930  Unix-compatible streaming editor v1.5 EXE/docs

              sed15.zip has C source, compilable for UNIX.

sed15.exe compiled with mingw32 for 32bit environments at:
http://www.**-**.com/ ~george/sed/sed15.exe        
                                ----
sed/batch/text                newsgroup:
                        alt.comp.editors.batch            
if your newsfeed does not carry it, search dejanews.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--
=-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
al aab, seders moderator                                      sed u soon
               it is not zat we do not see the  s o l u t i o n          
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+



 Sat, 17 Feb 2001 03:00:00 GMT   
 
   [ 8 post ] 

Similar Threads

1. : split lines in long 'sed'-pattern ?

2. Line to long for sed?

3. a SED script to find length of the longest line in a file

4. to all sed hackers - joining lines with sed

5. Command line: How long is too long?

6. SED / Line / (Need to get data from searched line from specific char to char)

7. Sed: merging lines recursively depending on line pattern

8. SED: Converting 5 line script into 1 line script

9. what sed command to print the first line and the last line

10. inserting a blank line after every line - using sed


 
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software