Abstract:The classical video salient object extraction model does not make full use of time-domain saliency cues,and is susceptible to the background noise interference.The extracted salient objects are incomplete.This paper proposed a video salient object extraction model under the guidance of spatio-temporal contrast.Firstly,adaptive fusion of RGB color space contrast and motion contrast was used to determine the prior information of prominent targets.Then,the energy function was composed of the foreground extraction item of the current frame and the position constraint item of the adjacent frames,which was used to guide the spatio-temporal saliency cue fusion.Finally,the complete video salient target was extracted by super-pixel smoothing optimization.The experimental results show that the model is tested on Visal,SegTrack V2 and DAVIS data sets.The MAE values in Visal,SegTrack V2 and DAVIS data sets are 0.030,0.024 and 0.032,respectively,and the F-measure values are 0.772,0.781 and 0.812,respectively,with good accuracy and robustness.This algorithm can effectively detect the visible targets in the video,thus providing theoretical reference and method basis for the monitoring system and target tracking.