Search for specific words and print only those words in a separate file

Question

I have a .gz file on my Unix server. I want to search for two words like abc123 and def456 from that file and if I have these words in the file, I want to print only those (only 2 words not entire line) words in a separate file.

i tried with grep command, but it is printing whole line from the file, but i want only those two words...not entire line which is having these words — Ramana Mahendrakar, Jun 30 '15 at 12:46
I suggest you edit the question with the command you used that didn't return the results you want. Someone will then be able to correct it for you. — Eric Hauenstein, Jun 30 '15 at 12:51
You should really show the command(s) that you've tried, explaining why they don't do what you want. Suppose the file wasn't compressed; what would you do to get the information you want from the non-compressed file? How do you see the decompressed contents of a file without actually decompressing the file? How do you combine these two operations? You say 'Unix'; which variant of Unix? Does it have GNU `grep` with the `-o` option? What should happen if the words you're after occur more than once each in the file? Does the order in which the words appear in the output matter? — Jonathan Leffler, Jun 30 '15 at 14:48

score 0 · Answer 1 · edited Jun 30 '15 at 14:58

0

You can try the following:

zcat f.xml.gz | awk '{\
{ \
if(index($0,str_1)) \
   cnt_1=1; \
if(index($0,str_2)) \
   cnt_2=1; \
if((cnt_1 + cnt_2) == 2) {\
   print str_1,str_2> "f_out.log"; exit;} \
} }' str_1="Keepout" str_2="LatLonList"

where

"f.xml.gz" is the input file
str_1 is the first word (your "abc123")
str_2 is the second word (your "def456")
"f_out.log" is the separate file in which the two words are written if found in the input file

Hope this helps.

edited Jun 30 '15 at 14:58

Jonathan Leffler

730,956
141
904
1,278

answered Jun 30 '15 at 13:24

il_raffa

5,090
129
31
36

All those backslashes are unnecessary unless you're careless enough to use a C shell derivative instead of a Bourne shell derivative as your main shell. Sea shells belong on the sea shore, IMO. And in a Bourne-shell derivative, those backslashes would break the script. The opening `{ {` and matching closing `} }` is odd; what's the advantage of the double braces instead of just single braces? Why did you decide to use 'Keepout' and 'LatLonList' instead of `abc123` and `def456`? – Jonathan Leffler Jun 30 '15 at 14:55
the above snippet is not working... if i give str_1 world then also it is printing world, even world word in not present in my file – Ramana Mahendrakar Jun 30 '15 at 16:34

score 0 · Answer 2 · edited May 23 '17 at 11:51

0

Your question has an answer in this SO post

You can run this command to achieve what you want

gzcat <filename.zip> | grep -oh "<Search pattern>" *

for ex

gzcat <filename.zip> | grep -oh "abc123" *

I do not have zgrep installed but you can also try this

zgrep -oh "<Search pattern>" *` filename.zip

edited May 23 '17 at 11:51

Community

1
1

answered Jul 01 '15 at 04:12

Biswajit_86

3,661
2
22
36

score 0 · Answer 3 · answered Apr 12 '18 at 14:49

`ripgrep`

Use ripgrep, it's written in Rust therefore very efficient, especially for large files. For example:

rg -zo "abc123|def456" *.gz

-z/--search-zip Search in compressed files (such as gz, bz2, xz, and lzma).

-o/--only-matching Print only the matched parts of a matching line.

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

`grep`/`zgrep`/`zegrep`

Use zgrep or zegrep to look for pattern in compressed files using their uncompressed contents (both GNU/Linux and BSD/Unix).

On Unix, you can also use grep (which is BSD version) with -Z, including -z on macOS.

Few examples:

zgrep -E "abc123|def456" *.gz
zegrep "abc123|def456" **/*.gz
grep -z -e "abc123" -e "def456" *.gz # BSD/Unix only.

^{Note: When you've globbing option enabled, ** checks the files recursively, otherwise use -r.}

-R/-r/--recursive Recursively search subdirectories listed.

-E/--extended-regexp Interpret pattern as an extended regular expression (like egrep).

-Z (BSD), -z/--decompress (BSD/macOS) Force grep to behave as zgrep.

Search for specific words and print only those words in a separate file

4 Answers4

ripgrep

grep/zgrep/zegrep

`ripgrep`

`grep`/`zgrep`/`zegrep`