grep — Defencia

→

The basic shape

Every grep command is the same three parts: the tool, a pattern to look for, and where to look.

$ grep "pattern" file.txt
  │     │         └── where to search (a file, or many files)
  │     └──────────── the pattern (plain text or a regex)
  └────────────────── the command

Quote the pattern. It protects spaces and special characters from being eaten by the shell.

It also reads from a pipe, which is how it spends most of its life in DFIR — filtering the output of another command:

$ cat /var/log/auth.log | grep "Failed"
$ ps aux | grep -i nginx
$ dpkg -l | grep openssl

Anything that produces text can be narrowed by piping it into grep.

Three names, one tool. grep uses basic regex; egrep (or grep -E) uses extended regex with no backslash-escaping of + ? { } ( ) |; fgrep (or grep -F) treats the pattern as a fixed string with no regex at all. Reach for grep -E by default — the extended syntax is the one most references assume.

→

Regular expressions in 5 minutes

A regular expression (regex) is a tiny language for describing text patterns. You do not need all of it — a handful of symbols covers almost everything an investigator asks for.

Symbol	Matches	Example
`.`	Any single character	`r..t` → root, rest, r4xt
`*`	Zero or more of the previous	`ab*c` → ac, abc, abbc
`+`	One or more of the previous	`ab+c` → abc, abbc (not ac)
`?`	Zero or one (optional)	`colou?r` → color, colour
`^`	Start of the line	`^Error` → lines beginning Error
`$`	End of the line	`failed$` → lines ending failed
`[ ]`	Any one character from the set	`[0-9]` → a digit
`[^ ]`	Any character not in the set	`[^0-9]` → a non-digit
`( )`	Group, often with `\|`	`(GET\|POST)`
`\|`	OR — either side	`error\|fail`
`{n}`	Exactly n of the previous	`[0-9]{3}` → three digits
`\`	Escape — treat a special char literally	`\.` → a literal dot
`\b`	Word boundary	`\bcat\b` → "cat", not "category"

The dot is not a dot. In regex, . means "any character", so 192.168.0.1 also matches 192x168y0z1. To match a literal dot — as in IPs, domains and file extensions — escape it: 192\.168\.0\.1. This trips up everyone at first.

Character class shorthands (with grep -P or POSIX)

Shorthand	Means	POSIX equivalent
`\d`	A digit	`[[:digit:]]`
`\w`	A word character (letter, digit, _)	`[[:alnum:]_]`
`\s`	Whitespace	`[[:space:]]`

\d \w \s need Perl mode (grep -P). With plain grep -E, use the POSIX classes or explicit ranges like [0-9].

→

The flags that matter

grep has dozens of options; these are the ones you will actually use. Most can be combined.

Matching behaviour

Flag	Effect
`-i`	Case-insensitive. `grep -i error` catches Error, ERROR, error.
`-v`	Invert — show lines that do not match. Great for filtering out noise.
`-w`	Whole word only. `grep -w cat` ignores "category".
`-x`	Whole line must match exactly.
`-E`	Extended regex (use `+ ? { } ( ) \|` without backslashes).
`-P`	Perl-compatible regex — enables `\d \w \s`, lookarounds.
`-F`	Fixed string — no regex, faster, and safe for patterns full of special characters.

Searching files and trees

Flag	Effect
`-r` / `-R`	Recursive — search every file under a directory. `-R` also follows symlinks.
`-l`	List only the filenames that contain a match — not the lines.
`-L`	List files that do not contain the match.
`--include="*.log"`	Only search files matching a glob.
`--exclude-dir=node_modules`	Skip whole directories.
`-a`	Treat binary files as text — search inside them anyway.

Output and context

Flag	Effect
`-n`	Show the line number of each match — so you can jump straight to it.
`-c`	Count matching lines instead of printing them.
`-o`	Print only the matched text, not the whole line. Essential for extracting IOCs.
`-H`	Always show the filename (default when searching many files).
`-A 3`	Print 3 lines after each match.
`-B 3`	Print 3 lines before each match.
`-C 3`	Print 3 lines of context on both sides.
`--color=auto`	Highlight the match. Often on by default; worth knowing the name.

Combine freely. grep -rniE "pattern" /path = recursive, case-insensitive, with line numbers, using extended regex. This one combination handles a large share of real searches.

→

DFIR examples

Where grep earns its keep. These are the searches you reach for during a real investigation — reading logs, hunting across a mounted image, and isolating indicators.

Reading authentication logs

# Every failed login attempt
$ grep "Failed password" /var/log/auth.log

# Successful logins — who got in
$ grep "Accepted" /var/log/auth.log

# Failed logins with 3 lines of context, line-numbered
$ grep -n -A3 "Failed password" /var/log/auth.log

# Count failures per source IP, ranked
$ grep "Failed password" /var/log/auth.log | grep -oE "[0-9]{1,3}(\.[0-9]{1,3}){3}" | sort | uniq -c | sort -rn

The last line is a complete brute-force triage: find failures, extract the IP with -oE, then count and rank. This pattern recurs constantly.

Searching across a mounted image or directory tree

# Find every file mentioning a suspicious domain, just the filenames
$ grep -rl "evil-c2.example" /mnt/evidence

# Hunt for hard-coded credentials across a codebase
$ grep -rniE "(password|passwd|api[_-]?key|secret)\s*=" /mnt/evidence 2>/dev/null

# Search only PHP files for a webshell signature
$ grep -rn --include="*.php" -E "(eval|base64_decode|system|shell_exec)\s*\(" /var/www

# Look inside a binary for embedded strings
$ grep -a "http" suspicious.bin

-rl gives you the list of files to examine next; -rni shows you exactly where. The webshell search is a classic first pass on a compromised web server.

Filtering out noise with -v

# Show running processes, minus the grep line itself
$ ps aux | grep ssh | grep -v grep

# All log lines except routine health checks
$ grep -v "200" access.log | grep -v "health"

grep -v grep is a daily habit — it removes the grep command from its own results.

Extracting indicators with -o

# Pull every unique IP address out of a log
$ grep -oE "[0-9]{1,3}(\.[0-9]{1,3}){3}" access.log | sort -u

# Extract all email addresses from a file
$ grep -oE "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" dump.txt | sort -u

# Find URLs in a captured payload
$ grep -oE "https?://[^\"' ]+" payload.txt | sort -u

-o turns grep from a line-finder into an extractor — exactly what you want when building an IOC list. sort -u de-duplicates.

→

Copy-ready IOC patterns

Common indicator patterns for use with grep -oE (extended regex). Pair each with | sort -u to get a clean, unique list.

Indicator	Pattern
IPv4 address	`[0-9]{1,3}(\.[0-9]{1,3}){3}`
Email address	`[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}`
URL (http/https)	`https?://[^\"' ]+`
Domain name	`([a-zA-Z0-9-]+\.)+[a-zA-Z]{2,}`
MD5 hash	`\b[a-fA-F0-9]{32}\b`
SHA-1 hash	`\b[a-fA-F0-9]{40}\b`
SHA-256 hash	`\b[a-fA-F0-9]{64}\b`
Bitcoin address	`\b(bc1\|[13])[a-zA-HJ-NP-Z0-9]{25,39}\b`
MAC address	`([0-9A-Fa-f]{2}:){5}[0-9A-Fa-f]{2}`
Base64 blob (long)	`[A-Za-z0-9+/]{40,}={0,2}`

These match shape, not validity. The IPv4 pattern will happily match 999.999.999.999, and the hash patterns match any hex string of the right length. They are for fast extraction and triage — validate and correlate the hits before treating anything as a confirmed indicator.

# Pull every SHA-256 hash from an incident report and sort uniquely
$ grep -oE "\b[a-fA-F0-9]{64}\b" report.txt | sort -u

# Build an IP watchlist from a week of logs, ranked by frequency
$ grep -hoE "[0-9]{1,3}(\.[0-9]{1,3}){3}" /var/log/nginx/*.log | sort | uniq -c | sort -rn

-h suppresses filenames when searching multiple files, keeping the extracted list clean.

Ready for the rest of the toolkit? Head back to the Linux guide for find, awk, strings and evidence hashing, or open the full Linux commands cheatsheet.