With the continuous advancement of computer vision technology, scene recognition plays an increasingly important role in fields such as intelligent surveillance, augmented reality, and human-computer interaction. However, in complex social scenes, the occlusion of scene information by people remains one of the key challenges in scene recognition. Particularly when individuals occupy a large area of the frame or overlap with each other, the integrity of scene information is severely compromised, leading to reduced recognition accuracy. To address this issue, this project proposes a scene recognition method that integrates object detection, image inpainting, and deep feature extraction to enhance recognition performance in occluded environments. Specifically, this project employs the YOLO object detection model to first accurately identify and locate individuals in the scene, aiding in the analysis of occluded areas. Building on this, adversarial edge learning is used to inpaint occluded regions of the scene, restoring the hidden environmental information. Finally, a residual network (ResNet) is utilized to extract global and local features, enabling robust scene recognition that accurately interprets scene content even under severe occlusion. The innovations of this project lie in: 1) integrating object detection with scene recognition to enable intelligent analysis of human occlusion; 2) leveraging adversarial edge learning for image inpainting to effectively compensate for missing information in occluded areas; and 3) adopting a residual network to enhance the robustness of scene recognition, ensuring high accuracy even in complex social scenes. The experimental results demonstrate that this method improves recognition accuracy by 12% compared to traditional approaches across multiple scene datasets. It maintains stable performance in highly occluded scenarios, with significant enhancements in image inpainting quality metrics such as PSNR and SSIM. This research provides an effective solution for intelligent scene recognition in occluded environments and lays the foundation for the practical application of smart visual systems.
