Streamlining text: How to merge file lines in pairs with a Bash script

Shell Scripting @ Freshers.in

Text processing is a common task in data analysis, where formatting and manipulating text files efficiently can often save a substantial amount of time. In this tutorial, we’ll focus on a frequent scenario where you need to merge lines of a text file in pairs. This can be especially handy when dealing with records, list items, or even code snippets that need to be consolidated. We will craft a Bash script that takes a filename as an argument and then merges its lines in pairs.

Setting Up Your Workspace

Open your preferred text editor and create a new file called merge_lines.sh. This will be your script file.

Script Content

Enter the following lines into merge_lines.sh:

#!/bin/bash

# Check for the presence of an argument
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <filename>"
    exit 1
fi

# Check if the file exists
if [ ! -f "$1" ]; then
    echo "Error: File not found."
    exit 1
fi

# Read the file and merge lines in pairs
awk 'NR%2{printf "%s ",$0;next;}1' "$1"

Making the Script Executable

Run the following command in your terminal to make the script executable:

chmod +x merge_lines.sh

Testing the Script with Real Data

To thoroughly test our script, we’ll create a sample text file with the following contents. Open a new file named sample.txt and insert the following lines:

Line1
Line2
Line3
Line4
Line5
Line6

Save and close sample.txt.

Now, run the script by passing sample.txt as an argument:

./merge_lines.sh sample.txt

The expected output should display the lines of sample.txt merged in pairs:

Line1 Line2
Line3 Line4
Line5 Line6

The script starts by checking if exactly one argument (the filename) is provided.

It then checks whether the file exists and is a regular file.

The awk command reads the file. NR%2 checks if the record number is odd. If it is, awk prints the line without a newline and reads the next line. When it encounters an even line number, it performs the default action, which is to print the line with a newline character, thus merging every two lines.

Other urls to refer

  1. PySpark Blogs
Author: user