Abstract :
The increasing prevalence of Artificial Intelligence (AI) and Machine Learning (ML) models across various industries has highlighted the critical need for efficient and scalable deployment strategies. Traditional deployment methods often struggle with adapting to fluctuating demands and maintaining cost-effectiveness. Serverless computing has emerged as a promising solution to address these challenges. This paper investigates the deployment of AI models within serverless architectures on Amazon Web Services (AWS), specifically focusing on AWS Lambda and Knative. The study analyzes the limitations of conventional deployment approaches and proposes innovative strategies leveraging the capabilities of serverless technologies. Furthermore, it presents a rigorous evaluation of the performance characteristics of these serverless deployment strategies, discusses crucial security and privacy considerations, incorporates illustrative real-world case studies, and outlines potential future research directions.
Keywords :
AI model deployment, AWS Lambda, cost-effectiveness, IEEE., Knative, performance evaluation, Privacy, scalability, Security., Serverless ArchitectureReferences :
- S. Venkataraman, “Ai goes serverless: Are systems ready?” ACM SIGARCH, Aug. 2023. [Online]. Available: https://www.sigarch.org/ ai-goes-serverless-are-systems-ready/
- L. Wang, Y. Jiang, and N. Mi, “Advancing serverless computing for scalable ai model inference: Challenges and opportunities,” in Proceedings of the 10th International Workshop on Serverless Computing, 2024, pp. 1–6. [Online]. Available: https://dl.acm.org/doi/ 10.1145/3702634.3702950
- C. McKinnel, “Massively parallel machine learning inference using aws lambda,” McKinnel.me Blog, Apr. 2021. [Online]. Available: https://mckinnel.me/massively-parallel-machine-learning-inference-using-aws-lambda.html
- K. Kojs, “A survey of serverless machine learning model inference,” arXiv preprint arXiv:2311.13587, 2023. [Online]. Available: https: //arxiv.org/abs/2311.13587
- R. Rajkumar, “Designing a serverless recommender in aws,”Medium, Jan. 2021. [Online]. Available: https://d-s-brambila.medium.com/designing-a-serverless-recommender-in-aws-fcf2de9a807e
- P. Naayini, P. K. Myakala, and C. Bura, “How ai is reshaping the cybersecurity landscape,” Available at SSRN 5138207, 2025. [Online]. Available: https://www.irejournals.com/paper-details/1707153
- AWS Lambda Developer Guide, Best Practices for Working with AWS Lambda Functions, AWS, 2023. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
- AWS Whitepaper, Security Overview of AWS Lambda, AWS, Nov. 2022. [Online]. Available: https://docs.aws.amazon.com/whitepapers/ latest/security-overview-aws-lambda/
- P. Naayini, P. K. Myakala, C. Bura, A. K. Jonnalagadda, and S. Kamatala, “Ai-powered assistive technologies for visual impairment,” arXiv preprint arXiv:2503.15494, 2025.
- Y. Fu, L. Xue, Y. Huang, A.-O. Brabete, D. Ustiugov, Y. Patel, and L. Mai, “Serverlessllm: Low-latency serverless inference for large language models,” in 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024, pp. 135–153. [Online]. Available: https://arxiv.org/abs/2401.14351
- J. Duan, S. Qian, D. Yang, H. Hu, J. Cao, and G. Xue, “Mopar: A model partitioning framework for deep learning inference services on serverless platforms,” in Proceedings of the 2024 IEEE International Conference on Cloud Computing (CLOUD), 2024, pp. 1–10. [Online]. Available: https://arxiv.org/abs/2404.02445
- A. Gallego, U. Odyurt, Y. Cheng, Y. Wang, and Z. Zhao, “Machine learning inference on serverless platforms using model decomposition,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, 2024, pp. 1–6. [Online]. Available: https://dl.acm.org/doi/10.1145/3603166.3632535
- J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “Fast-gshare: Enabling efficient spatio-temporal gpu sharing in serverless computing for deep learning inference,” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 635–644. [Online]. Available: https://arxiv.org/abs/2309.00558
- M. Yu, A. Wang, D. Chen, H. Yu, X. Luo, Z. Li, W. Wang, R. Chen, D. Nie, and H. Yang, “Faaswap: Slo-aware, gpu-efficient serverless inference via model swapping,” in Proceedings of the 2024 IEEE International Conference on Cloud Engineering (IC2E), 2024, pp. 1–12. [Online]. Available: https://arxiv.org/abs/2306.03622
- M. Yu, Z. Jiang, H. C. Ng, W. Wang, R. Chen, and B. Li, “Gillis: Serving large neural networks in serverless functions with automatic model partitioning,” in Proceedings of IEEE ICDCS, 2021, pp. 138–148. [Online]. Available: https://ieeexplore.ieee.org/document/9546452
- Kubeflow Authors, What is KServe?, Kubeflow KServe Documentation, Sep. 2021. [Online]. Available: https://www.kubeflow.org/docs/ external-add-ons/kserve/introduction/
- V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://arxiv.org/abs/1710.08460
- P. K. Myakala and S. Kamatala, “Scalable decentralized multi-agent federated reinforcement learning: Challenges and advances,” International Journal of Electrical, Electronics and Computers, vol. 8, no. 6, 2023.
- S. Kamatala, A. K. Jonnalagadda, and P. Naayini, “Transformers beyond nlp: Expanding horizons in machine learning,” Iconic Research And Engineering Journals, vol. 8, no. 7, 2025.