Resizing in a mobilenet SSD [on hold] - python

Is there a resizing of the image that is given to a mobilenet(v2) ssd network? If so, at what level?
I know that if I do the following code:
tensorlist = [ for tensor in tf.get_default_graph().as_graph_def().node]
for tensor in tensorlist:
raw = detection_graph.get_tensor_by_name(tensor+':0')
print(tensor, ":", raw.get_shape())
I get for the first layer:
image_tensor : (?, ?, ?, 3)
So the network takes an image as input regardless of size.
The final goal would be whether a small image activates fewer points in the network or not.


how to pool different shaped convolutional layer outputs to a fixed shape to pass for Fully connected layer

I have different sized input images, and I am passing them through the Conv layers in a CNN after which I should connect the Conv outputs to a Fully Connected Layer for classification.
Since the process has to be vectorised the outputs have to be of same shape so that a batch of images could be used for forward pass. And hence this problem of having same shape for all images at the input of Fully Connected layer.
But since my input images are of different shape my final Conv layer gives different shaped outputs, how do I pool/reshape the different shaped outputs from Last Conv Layer to a fixed shape so that they can be connected to immediate FCN layer?
Also, I have considered reshaping images to fixed size before processing but have performance[accuracy] issues as my input images vary in large[So, trying this].
If your inputs are consistent across examples (i.e. if inputs = image1, image2, then all your image1s are the same size, all your image2s are the same size, but image1.shape isn't necessarily the same as image2 you could just flatten your final conv outputs and concatenate the results before passing to a dense layer.
conv1_out = conv_network1(image1)
conv2_out = conv_network2(image2) # could be same network
flat1 = tf.layers.flatten(conv1_out)
flat2 = tf.layers.flatten(conv2_out)
dense_in = tf.concat((flat1, flat2), axis=1)
dense_out = tf.layers.dense(dense_in, units)
Alternatively, if your images are different sizes across batches if you have a large number of spatial features spatial pooling is another popular option.
flat1 = tf.reduce_mean(conv1_out, axis=(1, 2))
flat2 = tf.reduce_mean(conv2_out, axis=(1, 2))
You can also use max pooling, though the behaviour is subtly different.

Tensorflow: Concat different graphs for finetuning with external data

I want to implement this paper in tensorflow. It's approach is to create two separate CNNs at first, and then concatenate them for finetuning (as you can see in figure 1a)).
My current situation is: I have two pre-trained and saved models, each of them fed with a data queue input (so, no feed_dict and no Dataset API), and now i am off for finetuning. I want to restore them from disk and somehow concatenate them, so that i can define an optimizer which optimizes both networks.
This is my current approach:
# Build data input
aug_img, is_img, it_img = finetuning_read_training_images(trainset_size=num_steps,
# Load the graphs
print("Loading ECNN graph...")
ecnn_graph = tf.train.import_meta_graph(os.path.join(ecnn_path, "ecnn.meta"), clear_devices=True)
trained_target = tf.get_default_graph().get_tensor_by_name("E-CNN/ecnn_output/ecnn_output_convolution/BiasAdd:0")
augmented_icnn_input = tf.concat([is_img, trained_target], axis=2)
icnn_graph = tf.train.import_meta_graph(os.path.join(icnn_path, "icnn.meta"), clear_devices=True, input_map={"aug_img": augmented_icnn_input, "it_img": it_img, "is_img": is_img})
The data input function reads 3 batches. aug_img is the source image, which has reflections, e.g. when photographed through a glass panel, augmented with its edge map as a 4th color channel. The ECNN graph should predict the reflection-free edge map. Tensorflow should augment the plain source image, which is stored in the is_img variable with the predicted reflection-free edge map, which should happen in the lines beginning with trained_target and augmented_icnn_input. The augmented image is then fed to the ICNN graph which should create a reflection-free image then, so it is given the it_img which is the target image. It is fed the not-augmented source image again, only for tensorboard visualization.
But now i am unable to further proceed. I cannot concat the both tensors for creating the augmented_icnn_input because i get a ValueError: Tensor("E-CNN/ecnn_output/ecnn_output_convolution/BiasAdd:0", shape=(?, 224, 224, 1), dtype=float32) must be from the same graph as Tensor("batch:1", shape=(?, 224, 224, 3), dtype=float32).
Also i seem not to understand the input_map in combination with the data input queue correctly, since although i have had definied the aug_imgand other variables in the ICNN, and see them in tensorboard, i do not see them in the variables collection and therefore cannot map them.
So i would like to know: Is this the correct approach to combine two subgraphs in a bigger graph? How can i solve the problem that i am unable to concat the two tensors in augmented_icnn_input? And why is my input_map not working?

Tensorflow: Different activation values for same image

I'm trying to retrain (read finetune) a MobileNet image Classifier.
The script for retraining given by tensorflow here (from the tutorial), updates only the weights of the newly added fully connected layer. I modified this script to update weights of all the layers of the pre-trained model. I'm using MobileNet architecture with depth multiplier of 0.25 and input size of 128.
However while retraining I obsereved a strange thing, if I give a particular image as input for inference in a batch with some other images, the activation values after some layers are different from those when the image is passed alone. Also activation values for same image from different batches are different. Example - For two batches -
batch_1 : [img1, img2, img3]; batch_2 : [img1, img4, img5]. The activations for img1 are different from both the batches.
Here is the code I use for inference -
for tf.Session(graph=tf.get_default_graph()) as sess:
image_path = '/tmp/images/10dsf00003.jpg'
id_ = gfile.FastGFile(image_path, 'rb').read()
#The line below loads the jpeg using tf.decode_jpeg and does some preprocessing
id =, {jpeg_data_tensor: id_})
input_image_tensor = graph.get_tensor_by_name('input')
layerXname='MobilenetV1/MobilenetV1/Conv2d_1_depthwise/Relu:0' #Name of the layer whose activations to inspect.
layerX = graph.get_tensor_by_name(layerXname), {input_image_tensor: id})
The above code is executed once as it is and once with the following change in the last line:, {input_image_tensor: np.asarray([np.squeeze(id), np.squeeze(id), np.squeeze(id)])})
Following are some nodes in the graph :
[u'input', u'MobilenetV1/Conv2d_0/weights', u'MobilenetV1/Conv2d_0/weights/read', u'MobilenetV1/MobilenetV1/Conv2d_0/convolution', u'MobilenetV1/Conv2d_0/BatchNorm/beta', u'MobilenetV1/Conv2d_0/BatchNorm/beta/read', u'MobilenetV1/Conv2d_0/BatchNorm/gamma', u'MobilenetV1/Conv2d_0/BatchNorm/gamma/read', u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean', u'MobilenetV1/Conv2d_0/BatchNorm/moving_mean/read', u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance', u'MobilenetV1/Conv2d_0/BatchNorm/moving_variance/read', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add/y', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/Rsqrt', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_1', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/mul_2', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/sub', u'MobilenetV1/MobilenetV1/Conv2d_0/BatchNorm/batchnorm/add_1', u'MobilenetV1/MobilenetV1/Conv2d_0/Relu6', u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights', u'MobilenetV1/Conv2d_1_depthwise/depthwise_weights/read', ... ...]
Now when layerXname = 'MobilenetV1/MobilenetV1/Conv2d_0/convolution'
The activations are same in both of the above specified cases. (i.e.
layerxactivations and layerxactivations_batch[0] are same).
But after this layer, all layers have different activation values. I feel that the batchNorm operations after 'MobilenetV1/MobilenetV1/Conv2d_0/convolution' layer behave differently for batch inputs and a single image. Or is the issue caused by something else ?
Any help/pointers would be appreciated.
When you build the mobilenet there is one parameter called is_training. If you don't set it to false the dropout layer and the batch normalization layer will give you different results in different iterations. Batch normalization will probably change very little the values but dropout will change them a lot as it drops some input values.
Take a look to the signature of mobilnet:
def mobilenet_v1(inputs,
"""Mobilenet v1 model for classification.
inputs: a tensor of shape [batch_size, height, width, channels].
num_classes: number of predicted classes.
dropout_keep_prob: the percentage of activation values that are retained.
is_training: whether is training or not.
min_depth: Minimum depth value (number of channels) for all convolution ops.
Enforced when depth_multiplier < 1, and not an active constraint when
depth_multiplier >= 1.
depth_multiplier: Float multiplier for the depth (number of channels)
for all convolution ops. The value must be greater than zero. Typical
usage will be to set this value in (0, 1) to reduce the number of
parameters or computation cost of the model.
conv_defs: A list of ConvDef namedtuples specifying the net architecture.
prediction_fn: a function to get predictions out of logits.
spatial_squeeze: if True, logits is of shape is [B, C], if false logits is
of shape [B, 1, 1, C], where B is batch_size and C is number of classes.
reuse: whether or not the network and its variables should be reused. To be
able to reuse 'scope' must be given.
scope: Optional variable_scope.
logits: the pre-softmax activations, a tensor of size
[batch_size, num_classes]
end_points: a dictionary from components of the network to the corresponding
ValueError: Input rank is invalid.
This is due to Batch Normalisation.
How are you running inference. Are you loading it from the checkpoint files or are you using a Frozen Protobuf model. If you use a frozen model you can expect similar results for different formats of inputs.
Check this out. A similar issue for a different application is raised here.

TensorFlow convolution with variable length examples

I am trying to design a convolutional neural network to be trained on preprocessed audio files. Because the audio files are of different lengths the training examples are of variable size in one dimension. I have gotten the network to work by zero-padding all the examples to the same length but I am worried that that will make the network less accurate. I am using a TensorFlow placeholder and tried setting its shape to [1, None, None] so that whatever the shape of each example is it should fit. But this is giving me an error when it gets the the convolution part:
conv = tf.layers.conv1d(
inputs = input,
filters = 20,
kernel_size = 5,
name = "conv"
Is giving the error:
ValueError: Shape of a new variable (conv/kernel) must be fully defined, but instead was (5, ?, 20).
So it seems that TensorFlow won't allow convolution with an unknown shape using placeholders. Is there a workaround for this to run each example independently without padding them to the same length?

How to use max pooling to gather information from LSTM nodes

gru_out = Bidirectional(GRU(hiddenlayer_num, return_sequences=True))(embedded)
#Tensor("concat_v2_8:0", shape=(?, ?, 256), dtype=float32)
I use Keras to create a GRU model.I want to gather information from all the node vectors of the GRU model, instead of the last node vector.
For example, I need to get the maximum value of each vector, like the image description, but I have no idea on how to do this.
One may use GlobalMaxPooling1D described here:
gru_out = Bidirectional(GRU(hiddenlayer_num, return_sequences=True))(embedded)
max_pooled = GlobalMaxPooling1D(gru_out)