--
Thanks! Sorry for the late reply.
1. This is done to bring the center of the landmarks to the origin for rotation. The landmarks have been mapped between [0, 1]. So, 0.5 is the center value.
2. By dividing the landmarks by the image dimensions, we map them between 0 to 1. This allows us to use a sigmoid function at the output layer.
3. We normalize the image to get the pixel values in the range of [-1, 1] or [0, 1]. This leads to a stable training.