Abstract :
Complex challenges are facing the global oil and gas industry. Oil prices are dropping due to OPEC production level, US oil boom, and other factors. Many experts believe that prices of oil will remain low for years at equilibrium of around $40-50 (Blumberg, 2018; Walls and Zheng 2018; Azar, 2019). Although 2019 oil price is expected to average at $65 with a further decline at $62 by 2020 (Amadeo, 2019; Kasim, 2019). Also, newly commercial resources are extremely expensive to develop, as massive capital investments are required. This research intends to develop a comprehensive entity resolution framework that has the ability to search across multiple databases with disparate forms, tame large amounts of data very quickly, efficiently resolving multiple entities into one, as well as finding hidden connections without human intervention. Putting in place a system to manage these entities will not only help to better assign resources, but to do so in a more expedient fashion. Although the necessary information is mostly already available within the oil and gas companies, it is spread around different company areas and application. Entity resolution will helps to aggregate these data, identify and exploit connection between entities and offer holistic all-in-one information that can helps to identify and deal with potential risk. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and without using machine learning to find suitable parameterization and combination of similarity functions. In addition to approaches from the research community we also consider a state-of-the-art commercial entity resolution implementation. Our results indicate significant quality and efficiency differences between different approaches. We also find that some challenging resolution tasks such as matching product entities from Opec database are not sufficiently solved with conventional approaches based on the similarity of attribute values.
Keywords :
Big Data, Entity resolution, Framework, Machine learningReferences :
- Avigdor, G. 2014. Uncertain entity resolution. Proceedings of the VLDB Endowment 7(13) 1711-1712.
- Ayat, N., Akbarinia, R., Afsarmanesh, H., and Valduriez, P. 2014. Entity resolution for probabilistic data. Information Sciences 277 492-511.
- Azar, S. 2019. Oil prices, US inflation, US money supply and the US dollar. OPEC Energy Review 37(4) 387-415.
- Barkhatov, A., and Baranova, A. 2017. Cost Effectiveness in Oil and Gas Using Data Driven System. Oil and Gas Business (1) 153-177.
- Blumberg, G. 2018. Oil & gas, Kluwer Law International, [Place of publication not identified].
- Chuan, X., Wang, W., Lin, X., Yu, J., and Wang, G. 2018. Efficient similarity joins for near-duplicate detection. ACM Transactions on Database Systems 36(3) 1-41.
- Domingos, S., and Elkan, G. 2016. Query-time Entity Resolution. Journal of Artificial Intelligence Research 30 621-657.
- Dong, Y., Chen, J., and Tang, X. 2014. Unsupervised feature selection method based on latent Dirichlet allocation model and mutual information. Journal of Computer Applications 32(8) 2250-2252.
- Getoor, L., and Ashwin, M. 2014. Entity resolution. Proceedings of the VLDB Endowment 5(12) 2018-2019.
- Gevorkyan, A., and Semmler, W. 2017. Oil Price, Overleveraging and Shakeout in the Shale Energy Sector — Game Changers in the Oil Industry. SSRN Electronic Journal
- Giang, P. 2014. A machine learning approach to create blocking criteria for record linkage. Health Care Management Science 18(1) 93-105.
- Han, J. 2016. An Approach for Detecting Similar Duplicate Records of Massive Data. Journal of Computer Research and Development 42(12) 2206.
- Issa, H., and Vasarhelyi, M. 2017. Duplicate Records Detection Techniques: Issues and Illustration. SSRN Electronic Journal
- Izakian, H. 2018. Privacy preserving record linkage meets record linkage using unencrypted data. International Journal of Population Data Science 3(4)
- Kazim, A. 2019. Theoretical limits of OPEC Members’ oil production. OPEC Review 31(4) 235-248.
- Li Juan, Z., and Xiao, Z. 2018. Detection for Approximately Duplicate Records Based on Fuzzy Comprehensive Evaluation. Applied Mechanics and Materials 397-400 2464-2468.
- Martin, D. 2018. Integration of Multiple Oil and Gas Data Sources for Use in Forecasting Future Rates of Discovery of Oil and Gas. AAPG Bulletin 75
- Nigam, U., and McCallum, K. 2019. SPECIAL ISSUE ON ENTITY RESOLUTION Overview. Journal of Data and Information Quality 4(2) 1-2.
- Omar, E., and Steven, W. 2017. Conventional Identity Resolution Methods: Issues and Trend. The VLDB Journal 18(6) 1261-1277.
- Rajkumar, N., Kishore Kumar, K., and Vivek, J. 2018. Successive Duplicate Detection in Scalable Datasets in Cloud Database. International Journal of Engineering & Technology 7(2.4) 66.
- Reynolds, D. 2018. The Energy Utilization Chain: Determining Viable Oil Alternative Technology. Energy Sources 22(3) 215-226.
- Ripon, K., Rahman, A., and Rahaman, G. 2017. A Domain-Independent Data Cleaning Algorithm for Detecting Similar-Duplicates. Journal of Computers 5(12)
- Soyemi, J., and Adegboye, J. 2018. Database Record Duplicate Detection System using Simil Algorithm. International Journal on Computer Science and Engineering 9(2) 55-61.
- Sunter, I., and Fellegi, L. 2013. Reference reconciliation in complex information spaces. ACM Transactions on Knowledge Discovery from Data 1(1) 5-es.
- Tian, Z., Lu, H., Ji, W., Zhou, A., and Tian, Z. 2016. An n-gram-based approach for detecting approximately duplicate database records. International Journal on Digital Libraries 3(4) 325-331.
- Walls, W., and Zheng, X. 2018. Shale oil boom and the profitability of US petroleum refiners. OPEC Energy Review 40(4) 337-353.
- Whang, S., Menestrina, D., Koutrika, G., Theobald, M., and Garcia-Molina, H. 2015. Entity resolution with iterative blocking. Proceedings of the 35th SIGMOD international conference on Management of data – SIGMOD ’09
- Winkler, W. 2019. Matching and record linkage. Wiley Interdisciplinary Reviews: Computational Statistics 6(5) 313-325.
- Xing, Z., Xingchun, D., and Jianjun, C. 2018. A High Accurate Multiple Classifier System for Entity Resolution Using Resampling and Ensemble Selection. Mathematical Problems in Engineering 2015 1-6.
- Ya-Kun, L., and Gao, H. 2018. Efficient Entity Resolution on XML Data Based on Entity-Describe-Attribute. Chinese Journal of Computers 34(11) 2131-2141.
- Zhou, D., and Zhou, L. 2015. Algorithm for detecting approximate duplicate records in massive data. Journal of Computer Applications 33(8) 2208-2211.
- Zycher, B. 2018. Barriers to Alternative Energy Sources. Fuel and Energy 42(2) 129-133.