October 25, 2019

Tensorflow Dataset API

Data is the centerpiece of any machine learning effort. Hence, ingesting it efficiently is paramount considering how huge datasets can be. In this article, we will dive deep into the dataset API readily available in Tensorflow.

Mental model

Let’s begin by developing a mental model of how our data should be arranged. From experience, this will help in asserting shapes of both features and labels. We’ll define our features and labels by visually examining how they look.

 X = tf.ones((10,1))
 y = tf.zeros((10,1))

This will create tf.Tensor objects whose numpy attribute is a ndarraya of shape (10,1) respectively.

# X
<tf.Tensor: id=827, shape=(10, 1), dtype=float32, numpy=
array([[1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.],
       [1.]], dtype=float32)>
# y
<tf.Tensor: id=830, shape=(10, 1), dtype=float32, numpy=
array([[0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.],
       [0.]], dtype=float32)>

For a laymen’s perspective, this is similar to vertical list of ones and zeros. That basic intution will help us see how our data should look. A model that is developed to take this dataset, will take an item from X and y hence the pair will be of structure (1,1). Lets now create our dataset.

import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices((x,y))

This will create a TensorSliceDataset object

Share

© David Dexter 2022