For some time I was validating CSV files manually. While the file number was small it didn’t create any issues. The challenge came after file number started growing. Every couple of months file or two would be added to the overall process. This meant more time spent checking. Simply put more admin.
Like the next person, I don’t like extra admin. Due to this, I started looking into creating an automated CSV checking process using Python.

The goal of this blog post to continue developing a project that I worked on. At the moment it works and does what is needed. But code is not very readable and no documentation.

As a result, long term goal is to improve my project management skills using GitHub as a platform.

Also, it’s good to be back in blog post writing.

Goals that I set out for this project:

  • The script needs to pick up all the files from one folder
  • Error log file needs to include details for every row. To make it easy to find and fix CSV files.
  • Row checks must stop if an error is found in any of its values
  • At the end of running script list the files and totals for them
  • Errored files also need to be listed with the count of wrong rows

CSV Check Python script must do:

  • Date format
  • Number format
  • Retailer name check vs list in parameter folder
  • EAN check vs list provided in parameter folder
  • Product description check
  • Check all the rows for extra commas and other random symbols that could fail file loading.

I started with coding everything into one file. That proved hard to work with after adding all the check code. (Not planned this project to be more than one file.) That made it much easier to work with code.

Next steps by priority:

  • Rewrite code to make it easier to read and maintain
  • Improve the speed of the CSV check process
  • Add progress bar – nice to have
  • Documentation

Project GitHub: