Alright; so this whole input pipeline thing in pretty much every framework is the most undocumented thing in the universe. So this article is about demystifying it. We can break down the process into a few key steps:
- Acquire & Label Data
- Process Label Files for Record Conversions
- Process Label Files for Training a Specific Network Interface
- Train the Specific Network Interface
This is part 1. We will focus on the 3rd item in this list; processing the files into TF Records. Note you can find more associated code in the TensorFlow section of this git repository: https://github.com/drcrook1/CIFAR10
So you likely will run into this at some point. You are reading data from somewhere and it is relative path based; but that doesn’t necessarily always help load data in especially if you are storing data and your code in separate paths (which is common) or if you are sharing data with a team; or even if your data is just somewhere totally different.
Anyways; this article will help convert a .csv label file with actual named labels to a label file with full path with a numerical label that can be more easily one hot encoded during the reading process. Note for deep learning often this is a two step process. Step 1: Convert from relative pathing to specific pathing & numerical labels. Step 2: Convert to framework specific storage format for input reading pipeline (which varies framework to framework). Here we cover Step 1. We will be using the CIFAR 10 data set which can be downloaded from here: https://www.kaggle.com/c/cifar-10/data