Existing Audio-Visual Speech Recognition (AVSR) systems visually focus intensely on a small region of the face, centred on the immediate mouth area. This is poor design for a variety reasons in real w