I have a .gz file on my Unix server. I want to search for two words like abc123 and def456 from that file and if I have these words in the file, I want to print only those (only 2 words not entire line) words in a separate file.
- 730,956
- 141
- 904
- 1,278
- 29
- 4
-
i tried with grep command, but it is printing whole line from the file, but i want only those two words...not entire line which is having these words – Ramana Mahendrakar Jun 30 '15 at 12:46
-
I suggest you edit the question with the command you used that didn't return the results you want. Someone will then be able to correct it for you. – Eric Hauenstein Jun 30 '15 at 12:51
-
You should really show the command(s) that you've tried, explaining why they don't do what you want. Suppose the file wasn't compressed; what would you do to get the information you want from the non-compressed file? How do you see the decompressed contents of a file without actually decompressing the file? How do you combine these two operations? You say 'Unix'; which variant of Unix? Does it have GNU `grep` with the `-o` option? What should happen if the words you're after occur more than once each in the file? Does the order in which the words appear in the output matter? – Jonathan Leffler Jun 30 '15 at 14:48
4 Answers
You can try the following:
zcat f.xml.gz | awk '{\
{ \
if(index($0,str_1)) \
cnt_1=1; \
if(index($0,str_2)) \
cnt_2=1; \
if((cnt_1 + cnt_2) == 2) {\
print str_1,str_2> "f_out.log"; exit;} \
} }' str_1="Keepout" str_2="LatLonList"
where
- "f.xml.gz" is the input file
- str_1 is the first word (your "abc123")
- str_2 is the second word (your "def456")
- "f_out.log" is the separate file in which the two words are written if found in the input file
Hope this helps.
- 730,956
- 141
- 904
- 1,278
- 5,090
- 129
- 31
- 36
-
All those backslashes are unnecessary unless you're careless enough to use a C shell derivative instead of a Bourne shell derivative as your main shell. Sea shells belong on the sea shore, IMO. And in a Bourne-shell derivative, those backslashes would break the script. The opening `{ {` and matching closing `} }` is odd; what's the advantage of the double braces instead of just single braces? Why did you decide to use 'Keepout' and 'LatLonList' instead of `abc123` and `def456`? – Jonathan Leffler Jun 30 '15 at 14:55
-
the above snippet is not working... if i give str_1 world then also it is printing world, even world word in not present in my file – Ramana Mahendrakar Jun 30 '15 at 16:34
Your question has an answer in this SO post
You can run this command to achieve what you want
gzcat <filename.zip> | grep -oh "<Search pattern>" *
for ex
gzcat <filename.zip> | grep -oh "abc123" *
I do not have zgrep installed but you can also try this
zgrep -oh "<Search pattern>" *` filename.zip
- 1
- 1
- 3,661
- 2
- 22
- 36
ripgrep
Use ripgrep, it's written in Rust therefore very efficient, especially for large files. For example:
rg -zo "abc123|def456" *.gz
-z/--search-zipSearch in compressed files (such asgz,bz2,xz, andlzma).
-o/--only-matchingPrint only the matched parts of a matching line.
- 155,785
- 88
- 678
- 743
grep/zgrep/zegrep
Use zgrep or zegrep to look for pattern in compressed files using their uncompressed contents (both GNU/Linux and BSD/Unix).
On Unix, you can also use grep (which is BSD version) with -Z, including -z on macOS.
Few examples:
zgrep -E "abc123|def456" *.gz
zegrep "abc123|def456" **/*.gz
grep -z -e "abc123" -e "def456" *.gz # BSD/Unix only.
Note: When you've globbing option enabled, ** checks the files recursively, otherwise use -r.
-R/-r/--recursiveRecursively search subdirectories listed.
-E/--extended-regexpInterpret pattern as an extended regular expression (likeegrep).
-Z(BSD),-z/--decompress(BSD/macOS) Force grep to behave aszgrep.