Single view – International Journal of Current Science Research and Review

Optimizing AI Model Inference on Serverless Cloud Platforms: A Scalable Approach

Prudhvi NaayiniUniversity of Colorado Boulder Boulder, CO 80309 USA
Chiranjeevi BuraUniversity of Colorado Boulder Boulder, CO 80309 USA

Vol 8 No 5 (2025): Volume 08 Issue 05 May 2025

Article Date Published : 2 May 2025 | Page No.: 1927-1935 | Google Scholar | Crossref doi for this article

Abstract :

The increasing prevalence of Artificial Intelligence (AI) and Machine Learning (ML) models across various industries has highlighted the critical need for efficient and scalable deployment strategies. Traditional deployment methods often struggle with adapting to fluctuating demands and maintaining cost-effectiveness. Serverless computing has emerged as a promising solution to address these challenges. This paper investigates the deployment of AI models within serverless architectures on Amazon Web Services (AWS), specifically focusing on AWS Lambda and Knative. The study analyzes the limitations of conventional deployment approaches and proposes innovative strategies leveraging the capabilities of serverless technologies. Furthermore, it presents a rigorous evaluation of the performance characteristics of these serverless deployment strategies, discusses crucial security and privacy considerations, incorporates illustrative real-world case studies, and outlines potential future research directions.

Keywords :

AI model deployment, AWS Lambda, cost-effectiveness, IEEE., Knative, performance evaluation, Privacy, scalability, Security., Serverless Architecture

References :

S. Venkataraman, “Ai goes serverless: Are systems ready?” ACM SIGARCH, Aug. 2023. [Online]. Available: https://www.sigarch.org/ ai-goes-serverless-are-systems-ready/
L. Wang, Y. Jiang, and N. Mi, “Advancing serverless computing for scalable ai model inference: Challenges and opportunities,” in Proceedings of the 10th International Workshop on Serverless Computing, 2024, pp. 1–6. [Online]. Available: https://dl.acm.org/doi/ 10.1145/3702634.3702950
C. McKinnel, “Massively parallel machine learning inference using aws lambda,” McKinnel.me Blog, Apr. 2021. [Online]. Available: https://mckinnel.me/massively-parallel-machine-learning-inference-using-aws-lambda.html
K. Kojs, “A survey of serverless machine learning model inference,” arXiv preprint arXiv:2311.13587, 2023. [Online]. Available: https: //arxiv.org/abs/2311.13587
R. Rajkumar, “Designing a serverless recommender in aws,”Medium, Jan. 2021. [Online]. Available: https://d-s-brambila.medium.com/designing-a-serverless-recommender-in-aws-fcf2de9a807e
P. Naayini, P. K. Myakala, and C. Bura, “How ai is reshaping the cybersecurity landscape,” Available at SSRN 5138207, 2025. [Online]. Available: https://www.irejournals.com/paper-details/1707153
AWS Lambda Developer Guide, Best Practices for Working with AWS Lambda Functions, AWS, 2023. [Online]. Available: https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html
AWS Whitepaper, Security Overview of AWS Lambda, AWS, Nov. 2022. [Online]. Available: https://docs.aws.amazon.com/whitepapers/ latest/security-overview-aws-lambda/
P. Naayini, P. K. Myakala, C. Bura, A. K. Jonnalagadda, and S. Kamatala, “Ai-powered assistive technologies for visual impairment,” arXiv preprint arXiv:2503.15494, 2025.
Y. Fu, L. Xue, Y. Huang, A.-O. Brabete, D. Ustiugov, Y. Patel, and L. Mai, “Serverlessllm: Low-latency serverless inference for large language models,” in 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), 2024, pp. 135–153. [Online]. Available: https://arxiv.org/abs/2401.14351
J. Duan, S. Qian, D. Yang, H. Hu, J. Cao, and G. Xue, “Mopar: A model partitioning framework for deep learning inference services on serverless platforms,” in Proceedings of the 2024 IEEE International Conference on Cloud Computing (CLOUD), 2024, pp. 1–10. [Online]. Available: https://arxiv.org/abs/2404.02445
A. Gallego, U. Odyurt, Y. Cheng, Y. Wang, and Z. Zhao, “Machine learning inference on serverless platforms using model decomposition,” in Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing, 2024, pp. 1–6. [Online]. Available: https://dl.acm.org/doi/10.1145/3603166.3632535
J. Gu, Y. Zhu, P. Wang, M. Chadha, and M. Gerndt, “Fast-gshare: Enabling efficient spatio-temporal gpu sharing in serverless computing for deep learning inference,” in Proceedings of the 52nd International Conference on Parallel Processing, 2023, pp. 635–644. [Online]. Available: https://arxiv.org/abs/2309.00558
M. Yu, A. Wang, D. Chen, H. Yu, X. Luo, Z. Li, W. Wang, R. Chen, D. Nie, and H. Yang, “Faaswap: Slo-aware, gpu-efficient serverless inference via model swapping,” in Proceedings of the 2024 IEEE International Conference on Cloud Engineering (IC2E), 2024, pp. 1–12. [Online]. Available: https://arxiv.org/abs/2306.03622
M. Yu, Z. Jiang, H. C. Ng, W. Wang, R. Chen, and B. Li, “Gillis: Serving large neural networks in serverless functions with automatic model partitioning,” in Proceedings of IEEE ICDCS, 2021, pp. 138–148. [Online]. Available: https://ieeexplore.ieee.org/document/9546452
Kubeflow Authors, What is KServe?, Kubeflow KServe Documentation, Sep. 2021. [Online]. Available: https://www.kubeflow.org/docs/ external-add-ons/kserve/introduction/
V. Ishakian, V. Muthusamy, and A. Slominski, “Serving deep learning models in a serverless platform,” in 2018 IEEE International Conference on Cloud Engineering (IC2E), 2018, pp. 257–262. [Online]. Available: https://arxiv.org/abs/1710.08460
P. K. Myakala and S. Kamatala, “Scalable decentralized multi-agent federated reinforcement learning: Challenges and advances,” International Journal of Electrical, Electronics and Computers, vol. 8, no. 6, 2023.
S. Kamatala, A. K. Jonnalagadda, and P. Naayini, “Transformers beyond nlp: Expanding horizons in machine learning,” Iconic Research And Engineering Journals, vol. 8, no. 7, 2025.

Author's Affiliation

Prudhvi Naayini
University of Colorado Boulder Boulder, CO 80309 USA

Chiranjeevi Bura
University of Colorado Boulder Boulder, CO 80309 USA

Copyrights & License

Prudhvi Naayini, Chiranjeevi Bura, 2025

This work is licenced under a Creative Commons Attribution 4.0 International License.

Article Details

Issue: Vol 8 No 5 (2025): Volume 08 Issue 05 May 2025
Page No.: 1927-1935
Published : 2 May 2025
Section: Physical, Chemical Sciences, Engineering & Technology
DOI: https://doi.org/10.47191/ijcsrr/V8-i5-02

How to Cite :

Optimizing AI Model Inference on Serverless Cloud Platforms: A Scalable Approach. Prudhvi Naayini, Chiranjeevi Bura, 8(5), 1927-1935. Retrieved from https://ijcsrr.org/single-view/?id=22331&pid=22324

Most read articles by the same author(s)

Prudhvi Naayini, Chiranjeevi Bura,

View

211

Citation

Downloads

Article Details

Issue: Vol 8 No 5 (2025): Volume 08 Issue 05 May 2025
Page No.: 1927-1935
Published : 2 May 2025
Section: Physical, Chemical Sciences, Engineering & Technology
DOI: https://doi.org/10.47191/ijcsrr/V8-i5-02

Optimizing AI Model Inference on Serverless Cloud Platforms: A Scalable Approach

Vol 8 No 5 (2025): Volume 08 Issue 05 May 2025

Abstract :

Keywords :

References :

Author's Affiliation

Copyrights & License

Article Details

How to Cite :

Most read articles by the same author(s)

Downloads

Article Details

ABOUT IJCSRR

For Authors

Journal & Policies