Abstract :
Ensuring the safety and well-being of children in kindergartens requires continuous monitoring of their interactions with caregivers and the surrounding environment, as even short periods of inattentiveness can lead to accidents or unnoticed risky behavior. In this work, we present a computer-vision–based monitoring system that uses an improved YOLO-11 object detection model to localize and classify adults and children in surveillance video streams in real time. Based on the detection results, the system infers whether each child is currently supervised or unsupervised, and whether a child is present near predefined dangerous zones (such as exits, staircases, or other restricted areas) defined in the camera field of view.
To support this task, a custom dataset was created and annotated with bounding boxes for “child” and “adult” classes using both publicly available images and collected video frames from kindergarten-like environments, covering different viewpoints, illumination conditions, and crowd levels. The YOLO-11 model was trained and evaluated using standard detection metrics (precision, recall, F1-score and mAP) on separate training, validation, and test splits. In addition, a simple geometric reasoning module was implemented on top of the detector outputs to derive high-level safety events, such as “unsupervised child in the room” and “child entering a danger zone.”
A prototype implementation demonstrates that the proposed approach can robustly separate adults and children, operate at real-time frame rates on GPU hardware, and automatically flag frames where a child remains alone or moves toward restricted areas, thus providing timely cues for caregivers. These preliminary results confirm the feasibility of applying modern YOLO-family detectors to real-time kindergarten safety monitoring and provide a practical foundation for further extensions toward action recognition (e.g., falling, aggression, social isolation), spatio-temporal behavior analysis, and affective state estimation in early childhood education settings.
Keywords :
child safety, CNN, Object detection, supervision monitoring., video surveillance, YOLO-11References :
- Wang, X., “The Research and Analysis of Different Face Recognition Algorithms,” Journal of Physics: Conference Series, WLSA Shanghai Academy, Shanghai, China, 2022.
- Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
- Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection With Region Proposal Networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2015, pp. 91–99.
- -Q. Zhao, P. Zheng, S.-T. Xu, and X. Wu, “Object Detection With Deep Learning: A Review,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 11, pp. 3212–3232, 2019.
- Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-End Object Detection With Transformers,” in Computer Vision – ECCV 2020, pp. 213–229, 2020.
- Shehzadi, Q. Yang, M. Alazab, et al., “Object Detection With Transformers: A Review,” Sensors, vol. 25, no. 19, p. 6025, 2025.
- Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proc. IEEE CVPR, 2014.
- He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE CVPR, 2016.
- Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” Proc. IEEE CVPR, 2005.
- Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, 2015.
- LI LiLi, ZHANG YanXia and ZHAO YongHeng, “K Nearest Neighbors for automated classification of celestial objects,” Science in China Series G-Phys Mech Astron, Vol.51, no.7, July 2008, pp.
- Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- Saydazimov, J., S. Ergashev, and A. Nosirkulov. Research of Some Image Filter Algorithms Used in Object Detection. Proceedings of the 8th International Conference on Future Networks & Distributed Systems, 781–785. (2024).
- Saydazimov, J., S. Turaqulov, and J. Toshpo’latov. Image Enhancement Methods and Algorithms for Object Recognition Using Artificial Intelligence. Digital Transformation and Artificial Intelligence 3 (3): 42–46. (2025)
- Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
- Leila Zoubida, Réda Adjoudj “Integrating Face and the Both Irises for Personal Authentication”. I.J. Intelligent Systems and Applications, 2017, 3, 8-17
- Terven and D. Cordova-Esparza, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS,” Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1620–1659, 2023.
- Feng, Y. Hu, W. Li, and F. Yang, “Improved YOLOv8 Algorithms for Small Object Detection in Aerial Imagery,” Journal of King Saud University – Computer and Information Sciences, vol. 36, no. 18, art. 102113, 2024.
- S. Aldubaikhi, H. Shin, and H. B. Mahamadu, “Advancements in Small-Object Detection (2023–2025): Taxonomy, Analysis and Future Directions,” Applied Intelligence, 2025 (online first).

