I would recommend an option to insert a snapshot of the image playing at the moment into a video transcription at the time point in the transcription when and image changes. For example when transcribing a webinar replay having all the spots where the images are (slide positions) would save a lot of time. Right now I have to spend a lot of time inserting images of the powerpoint slides after the transcription because I have to watch the video again then insert the powerpoint slide image at the transition points in the webinar.