Skip to content

Pride, Not Prejudice: Processing Your Product Photos

We needed to upload product photos to their respective listings in our catalog. We ran into some obstacles.

pride not prejudice logo, with an electrician's panel opened. Gears and wiring are exposed.

Your beautiful product photos were too big

Modern digital cameras can produce incredibly HD photos. Such resolution is completely unnecessary for screen viewing, and much too big in file size to be quickly loaded in a website.

It turned out that there were other drawbacks to uploading extremely large photos to the product catalog. These photos absolutely crushed the server when they went through WordPress’ normal photo upload process. With a set of large photos (we had some photos coming in at over 50 megabytes), this could be an inconveniently lengthy operation. We were limited by how many photos we could upload at once, and sometimes the upload even resulted in a server crash.

I didn’t need to strain my production server, I could do all of the processing offline before uploading the photos. I used ImageMagick to process all vendor listing images, shrinking them down proportionately to 1080 pixels wide. I then further normalized all photos into the jpeg format for its good compression to detail integrity ratio as well as comprehensive browser support.

These steps ensured that photos would load quickly while shoppers were browsing the catalog.

404 – photos not found

I noticed that even with verified vendor listings data, it often happened that some photos were missing, or at least didnt bear the same file name as specified in the spreadsheet.

This could be because of disparity in capitalization. It could also be from the confusion caused by Windows, and possibly Mac, hiding the extension part of files (.jpg, .png, etc) by default. This second point lead to vendors adding extension to file names. As a result, some photos were named necklace-black.jpg.jpg or necklace-white.png.jpg. And sometimes, the correct photo was simply not included.

I normalized the image file names to eliminate a surprising amount of issues regarding associating photo files to their respective vendor listings. The data processing pipeline now made the photo names lowercase and removed all but the last file extension at the end of the filename.

While the photo name normalization eliminated a substantial amount of mismatch, some photos as listed in the vendor listings spreadsheet could still not be found. Sometimes, the vendor had included the photo in question, but with a slightly different name. To help me in my manual inspection of the vendor submission, I included filenames of photos that were present and similar in name to the photos that were missing in the missing photos report that the data processing pipeline generated. To determine similarity, I computed the Levenshtein Distance between the missing photo name and the filenames of provided photos. This often helped me to locate “missing” photos and align the vendor listing data accordingly.

Product photos as part of the listings import process

I squashed huge photos, and substantially reduced the rate of mismatch between vendor listings data and provided product photos, but at this point I still hadn’t solved the problem of manually adding photos to listings.

I learned that the WooCommerce data importer could import images that were referenced by URL. We were thrilled – this was going to save so much time and eliminate many mistakes caused my weary humans with eyestrain. I updated the data processing pipeline to push all vendor listing photos to online storage, accessible by the WooCommerce data importer.

Now the data processing pipeline was able to squash large product photos into a web-ready size, eliminate minor mismatch discrepancies between the data and the product photos, and facilitate automated uploading of said product photos to their respective listings in our product catalog.

Behind the scenes we were now able to reap the benefits of automated data verification. This got me thinking, what if our vendors could too? This train of thought lead to yet another drastic improvement in how we ran Pride, Not Prejudice.

Next in the series, Building A Data Validation Tool For Vendors