Power User Tools || Linux Mastery: From Zero to Hero || Day 3

raider070

In Linux, everything is treated as a file—not just text documents, but also hardware devices, and even the input and output of commands. A stream is a continuous flow of data from one of these “files” to another.

When a program runs, the Linux kernel gives it three standard streams to communicate with the terminal (and the world). Understanding these is the key to unlocking the power of the shell.

Standard Input (stdin)

◦ Definition: The standard input is a data stream arranged as a continuous set of bytes from which many Linux commands can receive data. It is referenced by file descriptor 0.

◦ Default Source: By default, standard input receives data from the keyboard. When you type characters, they are placed in the standard input stream and directed to the Linux command.

◦ Redirection: You can redirect standard input to receive data from a file instead of the keyboard using the less-than sign (<) operator. For example, cat < myletter reads input from myletter instead of the keyboard.

◦ Commands: Commands like cat without filename arguments, sort, uniq, and wc can accept input from stdin. The read command is also used to read a single line from standard input.

Standard Output (stdout)

◦ Definition: The standard output is a data stream where a command or program places its output. It is referenced by file descriptor 1.

◦ Default Destination: The default destination for standard output is a device, specifically the screen. Data in the standard output stream is directed to the screen device, which then displays it.

◦ Redirection (Overwrite): You can redirect standard output to a file instead of the screen using the greater-than sign (>) operator. If the file does not exist, it is created; if it exists, its contents are overwritten. For instance, ls > filenames.txt saves the list of files to filenames.txt.

◦ Redirection (Append): To add standard output to the end of an existing file without overwriting it, use the double greater-than sign (>>) operator. For example, cat myletter >> alletters appends the content of myletter to alletters.

◦ Force Overwriting: In C shell and Korn shell, >! can be used to force overwriting a file, even if the noclobber option is set.

◦ Commands: Commands like ls, cat, echo, lpr, sort, uniq, wc, head, tail, tee all send their results to standard output.

Standard Error (stderr)

◦ Definition: Standard error is a separate output data stream reserved solely for error messages. It is referenced by file descriptor 2.

◦ Default Destination: Like standard output, error messages are usually displayed on the screen by default, even if standard output has been redirected.

◦ Redirection (Overwrite): You can redirect standard error to a file using the 2> operator. For example, cat myintro 2> error_log.txt saves error messages to error_log.txt.

◦ Redirection (Append): To append standard error messages to a file, use the 2>> operator.

Stream Name	File Descriptor	Description	Default Target
Standard Input (stdin)	`0`	The source of input for a program.	The keyboard.
Standard Output (stdout)	`1`	Where a program sends its normal output.	The terminal screen.
Standard Error (stderr)	`2`	Where a program sends its error messages.	The terminal screen.

Filtering and processing text
In Linux, grep, cut, sort, uniq, and wc are essential filter commands used for text processing and are often combined using pipes.

Filters take input, modify it, and then output the altered data. Most filter commands accept one or more filenames as arguments, and if no filename is specified, they read from standard input.

1. grep (Global Regular Expression Print)

The premier tool for filtering lines of text based on a pattern.

bash

# Basic syntax
grep [options] pattern [file]

# Search for the word "error" in a log file (case-sensitive)
$ grep "error" /var/log/syslog

# Search for "error" (case-INSensitive)
$ grep -i "error" /var/log/syslog

# Search for lines that do NOT contain "success"
$ grep -v "success" output.log

# Count how many lines contain the pattern
$ grep -c "GET" webserver.log

# Show the line number of matching lines
$ grep -n "username" /etc/passwd

2. cut

Used to extract specific columns or fields from a structured text file (like CSV or /etc/passwd).

bash

# Basic syntax
cut [options] [file]

# Extract the first field (column) from a CSV, using a comma as the delimiter
$ cut -d ',' -f 1 employees.csv

# Extract fields 1 and 3 from /etc/passwd, using colon ':' as the delimiter
$ cut -d ':' -f 1,3 /etc/passwd

# Extract the first 5 characters from every line
$ cut -c 1-5 data.txt

d: Sets the delimiter (default is tab).
f: Specifies which field(s) to extract (e.g., 1, 1-3, 1,4).

3. sort (Sort Lines of Text)

Orders lines alphabetically or numerically.

bash

# Sort a file alphabetically (ascending order)
$ sort names.txt

# Sort in reverse (descending) order
$ sort -r names.txt

# Sort numerically (for numbers, not text)
$ sort -n numbers.txt

# Sort by a specific field (e.g., sort by the 3rd column, numerically)
$ sort -n -k 3 data.csv

r: Reverse the sort order.
n: Numerical sort (so 10 comes after 2).
k: Sort by a specific key (column).

4. uniq (Report or Omit Repeated Lines)

Removes or finds duplicate lines. Crucial Note: uniq only removes adjacent duplicates. You must almost always sort a file first.

bash

# Basic syntax. Often used with a sorted input.
uniq [options] [input_file]

# First sort to group duplicates, then remove them
$ sort log.txt | uniq

# Count the number of times each line appears
$ sort log.txt | uniq -c

# Only show lines that are duplicates
$ sort log.txt | uniq -d

c: Adds a count of occurrences to each line.
d: Only show duplicated lines.

5. wc (Word Count)

Prints counts of lines, words, and characters.

bash

# Count lines, words, and characters in a file
$ wc story.txt
  45  203 1160 story.txt
# (Lines: 45, Words: 203, Bytes: 1160)

# Count only the number of lines (great for counting list items)
$ wc -l < /etc/passwd

# Count the number of files in the current directory
$ ls -1 | wc -l

l: Count only lines.
w: Count only words.
c: Count only bytes.

Finding files
You know a file is on your system, but you don’t remember where. Manually searching through directories is inefficient. Linux provides two powerful tools for this: the robust, real-time find and the lightning-fast locate.

`find`

The find command is one of the most powerful tools in the Linux arsenal.

Its basic syntax is:

find [starting/path] [options] [expression] -action

1. Finding Files by Name

bash

# Find a file named "config.txt" anywhere under the /home directory
$ find /home -name "config.txt"

# Use `-iname` for case-insensitive search
$ find /home -iname "config.txt"

# Use wildcards (* and ?). Always quote the pattern!
$ find /var -name "*.log"

2. Finding Files by Type

bash

# Find all directories named "bin" under /usr
$ find /usr -type d -name "bin"

# Find all regular files (not directories, links, etc.)
$ find /home -type f

# Find all symbolic links
$ find /usr -type l

3. Finding Files by Size

bash

# Find files larger than 100 Megabytes (MB)
$ find / -size +100M

# Find files smaller than 1 Kilobyte (KB)
$ find /home -size -1k

# Find files exactly 1024 bytes
$ find /tmp -size 1024c

+ for larger than, for smaller than, no sign for exact size.
Units: c (bytes), k (Kilobytes), M (Megabytes), G (Gigabytes).

4. Finding Files by Owner/Permissions

bash

# Find files owned by the user "www-data"
$ find /var -user www-data

# Find files with SUID permission set (dangerous!)
$ find /usr -perm /4000

# Find files that are executable by the world
$ find . -perm /o=x

5. Taking Action on Found Files (The Powerful -exec)

The real power of find is automating tasks on the files it finds. The {} is a placeholder for the found file, and \; marks the end of the command.

bash

# Delete all files ending with ".tmp" in the /tmp directory
# WARNING: Use -delete with extreme caution! Prefer -exec rm {} \;
$ find /tmp -name "*.tmp" -delete

# More verbose and safe: use -exec to run rm on each file
$ find /tmp -name "*.tmp" -exec rm {} \;

# Change ownership of all .html files to user "alice"
$ find /var/www -name "*.html" -exec chown alice {} \;

# Display found files with details (ls -l)
$ find /etc -name "*.conf" -exec ls -l {} \;

`locate`

The locate command provides a rapid database search for pathnames and outputs every name that matches a given substring.

bash

# Basic search for a file named "passwd"
$ locate passwd
/etc/passwd
/etc/passwd-
/usr/bin/passwd
/usr/share/doc/passwd
... (many more results)

# Use `-i` for case-insensitive search
$ locate -i "report2023"

# Use `-n` to limit the number of results
$ locate -n 5 ".log"

# Sometimes you need to update the database manually if the file is new
$ sudo updatedb

Important: If you just created a file and locate can’t find it, run sudo updatedb to refresh the database.

`find` vs. `locate`: A Quick Comparison

Feature	`find`	`locate`
Method	Searches the live filesystem in real-time.	Searches a pre-built database (`mlocate.db`).
Speed	Slower (scans actual directories).	Blazingly fast (searches a single index).
Completeness	Always 100% accurate and up-to-date.	May not show files created after the last database update.
Complexity	Very powerful with many options.	Simple, primarily for name-based search.