Join Similar CSV Using Perl

When exporting logs in Check Point using the smart log feature, sometimes the column order is randomized. If you have multiple files and need to join them, especially if they have different column orders, you don’t want to do this by hand.

1 Million Record Limit

You have a limit of 1 Million records when you export log files using the Check Point Smart Log application. Excel similarly limits file sizes to 1 Million records. However, many times you will find multiple CSV files stored as flat text databases and you need to parse them and they will total more than 1 Million records.

Text::CSV_XS to Join CSVs

If you want to join multiple CSV together by hand, you would have to open each CSV, make sure the column order was correct, rearrange bad columns, then paste them. Excel has a limit of 1 Million records. What happens if your CSV file has 15 million records? This Perl script was written to automate this tedious puzzle without the use of Excel.

Join CSV Files with Different Columns

The first version simply opened a set of files and then appended them to a new file, skipping any headers after the first one was found. This version uses a hash to map the columns then rewrites all CSV files to the output format you specify.

Error Checking on CSV

This Perl script does not check the validity of your CSV, and trusts Text::CSV_XS to do that job. If there are encoding errors, it might choke. For the purposes of joining multiple Check Point Log exports into 1 single CSV, it performs very well. You could easily add a few lines that would remove commas or massage the data as needed if that was required. Luckily, it wasn’t required for this Perl script or the data sets we were using.

Would you like Custom Perl Scripting for your projects? We can help you parse data and extract the information you want, just like this Perl script does:

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.