Real-Time Monitoring of Kindergarten Safety Using YOLO-11-Based Detection of Children and Adults
Ensuring the safety and well-being of children in kindergartens requires continuous monitoring of their interactions with caregivers and the surrounding environment, as even short periods of inattentiveness can lead to accidents or unnoticed risky behavior. In this work, we present a computer-vision–based monitoring system that uses an improved YOLO-11 object detection model to localize and classify adults and children in surveillance video streams in real time. Based on the detection results, the system infers whether each child is currently supervised or unsupervised, and whether a child is present near predefined dangerous zones (such as exits, staircases, or other restricted areas) defined in the camera field of view.
To support this task, a custom dataset was created and annotated with bounding boxes for “child” and “adult” classes using both publicly available images and collected video frames from kindergarten-like environments, covering different viewpoints, illumination conditions, and crowd levels. The YOLO-11 model was trained and evaluated using standard detection metrics (precision, recall, F1-score and mAP) on separate training, validation, and test splits. In addition, a simple geometric reasoning module was implemented on top of the detector outputs to derive high-level safety events, such as “unsupervised child in the room” and “child entering a danger zone.”
A prototype implementation demonstrates that the proposed approach can robustly separate adults and children, operate at real-time frame rates on GPU hardware, and automatically flag frames where a child remains alone or moves toward restricted areas, thus providing timely cues for caregivers. These preliminary results confirm the feasibility of applying modern YOLO-family detectors to real-time kindergarten safety monitoring and provide a practical foundation for further extensions toward action recognition (e.g., falling, aggression, social isolation), spatio-temporal behavior analysis, and affective state estimation in early childhood education settings.
