Text manipulation in Linux
Common commands
awk
- pattern scanning and processing languagesed
- stream editortac
- concatenate and print files in reverse
sed
sed is a stream editor. It’s useful for manipulating text in files, or as part of a pipeline.
Replace a string in a file
sed -i 's/old/new/g' file.txt
Cookbook
Deduplicating lines in a file which have the same column value
Remove duplicate lines in a file, where values in one column are the same, whilst preserving order. For example, removing duplicate properties in a key/value property file (e.g. a Java .properties
file). Either keep the first occurrence of the duplicate column line:
awk -F'=' '!seen[$1]++' foo.properties
…or keep the last occurrence of the duplicate column line:
tac foo.properties | awk -F'=' '!seen[$1]++' | tac
Looping over a CSV and extracting some values
If you have a CSV and you want to use the fields in each line as arguments for a command, you can use the IFS (internal field separator) variable to split the line into fields:
cat > users.csv <<EOF
tom,henlo
david,cheese
EOF
cat users.csv | while IFS=, read -r username password ; do echo $username has a password of $password; done
There’s an example of this in the Google Cloud page.
Checking if the first part of an email address is longer than N
cat > emails.txt <<EOF
blah@example.com
EOF
awk -F'@' '{ if (length($1) > 10) { print $1 } }' emails.txt
awk -F'@' '(length($1) > 10)' emails.txt
Looping over a username:password list and creating a command for each
Useful for performing the same action repeatedly, e.g. perform an HTTP request, test an app, etc.
If you have a list of usernames/passwords like this:
john:mypass
jane:password123
jacob:thankyougoodnight
You can use read
and a while
loop to generate some commands like this - this example runs a Playwright test for each user:
while IFS=":" read -r username password
do
APPLICATION_URL="https://$username.grafana.net" USERNAME="$username" PASSWORD="$password" npx playwright test my-app-test.spec.js
done <<< "$LOGINS"