Explainable Pneumonia Detection in Chest X-Rays: A Comparative Study of CNNs and Vision Transformers

Pneumonia is a leading cause of global mortality, especially among children and the elderly, and chest radiography (CXR) remains the most widely used modality for its diagnosis. While deep learning has reached or exceeded radiologist-level performance on this task, the resulting models are still treated as opaque black boxes, which is a critical barrier to clinical deployment. In this work, we present a comparative and interpretable computer-aided-diagnosis (CAD) framework for pneumonia detection that combines three modern image-recognition backbones—a convolutional ResNet-50, a Swin Transformer (Swin-T), and a modernised convolutional network (ConvNeXt-T)—with Gradient-weighted Class Activation Mapping++ (Grad-CAM++) explanations. The three backbones were fine-tuned on the public Kermany chest X-ray dataset using a class-balanced training subset, weighted cross-entropy and an early-stopping protocol, and then evaluated on the held-out test set of 624 images. The Swin-T backbone achieved the best overall performance with a test accuracy of 95.51%, an F1-score of 0.95 and only 11 false negatives out of 234 normal cases, outperforming both ResNet-50 (93.11%) and ConvNeXt-T (88.94%). Grad-CAM++ heatmaps generated from the convolutional and transformer feature maps consistently localised on the affected pulmonary regions, providing radiologically plausible visual evidence for each prediction. Compared with five recent state-of-the-art pneumonia detectors, our Swin-T-based pipeline reaches a competitive accuracy while delivering layer-faithful visual explanations, supporting its use as a transparent decision-support tool in clinical workflows.

ABOUT IJCSRR

For Authors

Journal & Policies

You might also like

ABOUT IJCSRR

For Authors

Journal & Policies