Using join in bash

Goal

We want to join two different files that contain one field in common into a single file

Description

Lets imagine we have two different files that contains one field in common and we want to join them together into a single file, using exactly that field

How to

In our example, we will assume we have the following simple files (each field is separated with a single tab):

File 1

1       Clockare
2       CVManager
3       Audima
4       ReC

File 2

1       Time Tracker
2       Career Manager
3       Autharquicals Manager
4       Hardware remote controller

And we want to achieve a final file containing the contents of both files, such as:

Result

1       Clockare       Time Tracker
2       CVManager       Career Manager
3       Audima       Autharquicals Manager
4       ReC       Hardware remote controller

The solution to the previous problem is as simple as running the following statement (the trick here is knowing how to write a tab for the field separator, which can be performed through the combination ctr+v + <tab>. By the way, be careful if you copy the code from this page as the tab may be turned into a set of spaces!):

join -t’ ‘ File1 File2

Pretty simple, right? Now, lets do the same exercise but with a second file that contains the common column in a different order, such as the following:

File3

Time Rracker    1
Career Manager  2
Autharquicals Manager   3
Hardware remote controller      4

The solution would still be simple and still using only join, as in (we simply state that, in the case of the second file (-2), we will use the second field number):

join -t’ ‘ -2 2 File1 File3

Finally, and to make things a little harder, lets imagine we have a file that is not sorted nor has the fields in the same order as the first file, such as:

Career Manager  2
Time tracker    1
Hardware remote controller      4
Autharquicals Manager   3

In this particular situation, we will have to “sort the file using the last field and join them using the second file’s second field”. So, lets try to translate the previous English sentence into some shell commands, as in:

awk ‘{print $NF”|”$0}’ File4 | sort -t’|’ | awk -F’|’ {‘print $NF’} > File4_sorted ; join -t’ ‘ -2 2 File1 File4_sorted ; rm File4_sorted

Lets try to dismistify the previous commands. The first awk statement will generate the contents of the original file but with the last field, followed by a ‘|’ in the beginning of the sentence, as in:

2|Career Manager        2
...

Then, the sort using ‘|’ as separator, sorts the lines by their ID number instead of their description. Next, we filter its result with awk again to remove the temporary part, i.e., the ID, followed by the ‘|’, and write the output to a new temporary file named File4_sorted. At this moment, we have exactly the same contents of the File3 and, in this case, we can issue exactly the same statement that we did for joining files 1 and 3. In the end, and just to clean things up, we remove the temporary file File4_sorted.

Explanations

Although join is a simple command, it is still useful and important. After all, it is one of the commands one must know for the LPIC 1 certification. Therefore, I hope this simple exercise can help you to better understand how to use it (it was useful for me, at least, as I could exercise what I have learned while studying for the certification)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s