Aibased purchase order automation using confidenceaware hybrid extraction and erp integration| International Journal of Innovative Science and Research Technology

AI-Based Purchase Order Automation Using Confidence-Aware Hybrid Extraction and ERP Integration

Authors : Gowtham S.; Radhika M.; Maduvanthi S.; Thulasi P.; Sanchith Shanmugha Sundaram R.; Muthamizh Kavi E.

Volume/Issue : Volume 11 - 2026, Issue 5 - May

Google Scholar : https://tinyurl.com/yc5dvs8s

Scribd : https://tinyurl.com/2h6xhzt3

DOI : https://doi.org/10.38124/ijisrt/26May1130

PlumX Metrics

Semantic Scholar

ResearchGate

Note : A published paper may take 4-5 working days from the publication date to appear in PlumX Metrics, Semantic Scholar, and ResearchGate.

Abstract : However, some firms still opt to handle email purchase orders manually, leading to inefficiency, mistakes, and unwanted delays. In this regard, the emails are not only written in simple text form but are also scanned and/or provided as PDF files, thereby complicating the process of extracting data from such emails. This research suggests the use of a completely automated process that will manage the emails in order to structure the extracted data for use by the ERP system. For this reason, a confidence-aware hybrid approach was applied to extract data about the products, quantity, and even shipping details from the purchase orders based on the language model, named entities, and rule approaches. With the use of the confidence-aware approach, the system identifies reliable data, while the input quality controller handles text misrecognition and differences in email format. The whole process involves retrieving emails, doing OCR, validating the data, and importing into the ERP system.

Keywords : Purchase Order Automation, Optical Character Recognition (OCR), Named Entity Recognition (NER), ERP Integration.

References :

J. Li, S. Sen, and N. Zaman, “Entity extraction from business emails,” International Journal of Information Technology and Computer Science, vol. 7, no. 9, pp. 15–22, Aug. 2015.
S. Wiriyapistan and S. Sinthupinyo, “Extracting structured data from unstructured text using conditional random field and Jaccard similarity,” in Proc. 11th Int. Conf. Information Technology (ICIT), 2019, pp. 103–106.
A. R. Katti, C. Reisswig, C. Guder, S. Brarda, S. Bickel, J. Höhne, and J. Faddoul, “Chargrid: Towards understanding 2D documents,” in Proc. EMNLP, 2018, pp. 4459–4469.
C. Sage, R. Aussem, and H. Elghazel, “End-to-end extraction of structured information from business documents with pointer-generator networks,” in Proc. Workshop on Structured Prediction for NLP, 2020, pp. 43–52.
X. Holt and A. Chisholm, “Extracting structured data from invoices,” in Proc. Australasian Language Technology Association Workshop, 2018, pp. 53–59.
X. Liu, F. Gao, Q. Zhang, and H. Zhao, “Graph convolution for multimodal information extraction from visually rich documents,” in Proc. NAACL-HLT, 2019, pp. 32–39.
F. Krieger, P. Drews, and B. Funk, “Automated invoice processing: Machine learning-based information extraction for long tail suppliers,” Intelligent Systems with Applications, vol. 20, 2023.
V. Perot, M. Rusinol, and D. Karatzas, “LMDX: Language model-based document information extraction and localization,” in Findings of ACL, 2024, pp. 15140–15168.
Y. Xu, Y. Lv, M. Cui, et al., “LayoutLMv2: Multi-modal pre-training for visually-rich document understanding,” in Proc. ACL, 2021, pp. 2579–2591.
S. Appalaraju, B. D. Trainor, M. Jain, et al., “DocFormer: End-to-end transformer for document understanding,” in Proc. IEEE/CVF ICCV, 2021.
Z. Huang, Y. Chen, J. Li, and J. Zhou, “ICDAR2019 competition on scanned receipt OCR and information extraction,” in Proc. ICDAR, 2019, pp. 1516–1520.
T. A. N. Dang and D. N. Thanh, “End-to-end information extraction by character-level embedding and multi-stage attentional U-Net,” in Proc. British Machine Vision Conference (BMVC), 2019.
Y. Xu, M. Li, L. Cui, S. Huang, F. Wei, and M. Zhou, “LayoutLM: Pre-training of text and layout for document image understanding,” in Proc. AAAI, 2020, pp. 11993–12000.
V. P. d’Andecy, E. Hartmann, and M. Rusinol, “Field extraction by hybrid incremental and a-priori structural templates,” in Proc. Int. Workshop on Document Analysis Systems (DAS), 2018, pp. 251–256.
T. I. Denk and C. Reisswig, “BERTgrid: Contextualized embedding for 2D document representation and understanding,” in Proc. NeurIPS Workshop on Document Intelligence, 2019.

However, some firms still opt to handle email purchase orders manually, leading to inefficiency, mistakes, and unwanted delays. In this regard, the emails are not only written in simple text form but are also scanned and/or provided as PDF files, thereby complicating the process of extracting data from such emails. This research suggests the use of a completely automated process that will manage the emails in order to structure the extracted data for use by the ERP system. For this reason, a confidence-aware hybrid approach was applied to extract data about the products, quantity, and even shipping details from the purchase orders based on the language model, named entities, and rule approaches. With the use of the confidence-aware approach, the system identifies reliable data, while the input quality controller handles text misrecognition and differences in email format. The whole process involves retrieving emails, doing OCR, validating the data, and importing into the ERP system.

Keywords : Purchase Order Automation, Optical Character Recognition (OCR), Named Entity Recognition (NER), ERP Integration.

Paper Submission Last Date
31 - July - 2026

SUBMIT YOUR PAPER CALL FOR PAPERS

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.