It is currently Tue, 17 May 2022 03:01:48 GMT

Author Message
 extracting text between patterns

i am looking to find a way to 'extract' text from between two patterns.
  the patterns could be on the same line, or they could be on different


and i only want:

something else
and i only want:

something else

if my patterns appear more than once (pattern1 followed by pattern2
more than once) is there a way to specify which set to use?

 Sun, 24 Feb 2008 01:32:28 GMT   
 extracting text between patterns

One possible solution with sed:
sed -n ':A;/pattern1.*pattern2/b B;N;b A;:B;s/pattern1\(.*\)pattern2/\1/gp'
pgancarz, at, o2, pl

 Sun, 24 Feb 2008 01:51:14 GMT   
 extracting text between patterns

I don't know how robust you need this to be wrt handling pathological
cases (e.g. pattern2 is part of pattern1), but this will work for the
samples you posted:

awk '/pattern1/,/pattern2/{gsub(/pattern1|pattern2/,"");print}'

Sure, just define a counter, increment it when pattern2 is found, and
only print (or sub()) when the counter hits the magic number you care about.


 Sun, 24 Feb 2008 01:50:03 GMT   
 extracting text between patterns

$ echo "pattern1sometextpattern2" | \
perl -l -00ne'@x = /(?<=pattern1)(.+?)(?=pattern2)/sg and print $x[0]'

$ echo "pattern1sometext

something else
somemorepattern2" | \
perl -l -00ne'@x = /(?<=pattern1)(.+?)(?=pattern2)/sg and print $x[0]'
something else

$ echo "pattern1sometextpattern2
something pattern1 else
somemorepattern2" | \
perl -l -00ne'@x = /(?<=pattern1)(.+?)(?=pattern2)/sg and print $x[1]'

use Perl;

 Sun, 24 Feb 2008 04:22:11 GMT   
 extracting text between patterns

- If your text is non-XML/HTML, then you can replace 'pattern1' and
  'pattern2' with '<tag>' and '</tag>', and run it through XML parser.

- If regex is applicable, then you can try something like
  But, it will be slow, unless you can be more specific for '.*'.

- Otherwise, you need to split on 'pattern1', then split on 'pattern2'.
        a=`< file`
        match -2 "$a" pattern1 b
        match -2 "${b[1]}" pattern2 c
        echo ${c[0]}
  will give you the first content.  Repeat.

I used to have this feature in my patch for Bash, but removed it because
the usage was too complicated and difficult to remember.  If this
problem is important enough, then I can patch it to Bash. :-)

William Park <>, Toronto, Canada
ThinFlash: Linux thin-client on USB key (flash) drive
BashDiff: Super Bash shell

 Sun, 24 Feb 2008 04:57:51 GMT   
 extracting text between patterns

   No external command is necessary, if you are using bash or ksh93;
   you can load the entire file into a variable:

file=$( < "$FILENAME" ) ## This is assumed before all the following scripts.

   With other shells, an external command is necessary:

file=$( cat "$FILENAME" )

   To extract the first occurrence:


   To extract the last occurrence:


   To extract the Nth occurrence, use this N-1 times then the first


    Chris F.A. Johnson                     <>
    Shell Scripting Recipes: A Problem-Solution Approach, 2005, Apress

 Sun, 24 Feb 2008 15:13:53 GMT   
 extracting text between patterns

Finde the 2nd occurrence:

ruby -0777ne'$_=~/(?:.*?pattern1(.*?)pattern2){2}/m;puts $1' file

 Sun, 24 Feb 2008 18:57:18 GMT   
   [ 7 post ] 

Similar Threads

1. Pattern matching and extracting the data which matches the pattern

2. Script to extract portions of text from a text file

3. Extracting lines from a text file that match a certain criteria to another text file

4. sed extract pattern from stream

5. Extracting the String Between Two Patterns

6. extracting a pattern from file

7. extracting a pattern from a line

8. Extracting patterns from input

9. Pattern Extract

10. sed: extracting a pattern

Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Designed by ST Software