[1] LI P, RAO X, BLASE J, et al. CleanML: a benchmark for joint data cleaning and machine learning[J]. arXiv:1904. 09483, 2019.
[2] FAN W, GEERTS F. Foundations of data quality management[M]. Morgan & Claypool Publishers, 2012.
[3] HASTIE T, TIBSHIRANI R, FRIEDMAN J H, et al. The elements of statistical learning: data mining, inference, and prediction[M]. Berlin, Heidelberg: Springer, 2009.
[4] FENG H, CHEN G, CHENG Y, et al. A SVM regression based approach to filling in missing values[C]//LNCS 3683: Pro-ceedings of the 2005 International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Melbourne, Sep 14-16, 2005. Berlin, Heidelberg: Springer, 2005: 581-587.
[5] XIONG H, PANDEY G, STEINBACH M, et al. Enhancing data analysis with noise removal[J]. IEEE Transactions on Knowledge and Data Engineering, 2006, 18 (3): 304-319.
[6] JIA R, DAO D, WANG B, et al. Towards efficient data valuation based on the Shapley value[C]//Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Naha, Apr 16-18, 2019: 1167-1176.
[7] SHAPLEY L S. A value for n-person games[M]//KUHN H W,?TUCKER?A W. Contributions to the Theory of Games. Princeton University Press, 1953: 307-317.
[8] DUDANI S A. The distance-weighted k-nearest-neighbor rule[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1976, 6(4): 325-327.
[9] EATWELL J, MILGATE M, NEWMAN P, et al. Game theory[M]. London: Palgrave Macmillan, 1989.
[10] JIA R, DAO D, WANG B, et al. Efficient task-specific data valuation for nearest neighbor algorithms[J]. Proceedings of the VLDB Endowment, 2019, 12(11): 1610-1623.
[11] KARLA B, LI P, WU R, et al. Nearest neighbor classifiers over incomplete information: from certain answers to certain predictions[J]. Proceedings of the VLDB Endowment, 2020, 14(3): 255-267.
[12] MALEKI S. Addressing the computational issues of the Shapley value with applications in the smart grid[D]. Southampton: University of Southampton, 2015.
[13] GHORBANI A, ZOU J. Data Shapley: equitable valuation of data for machine learning[C]//Proceedings of the 36th International Conference on Machine Learning, Long Beach, Jun 9-15, 2019: 2242-2251.
[14] ELISSEEFF A, PONTIL M. Leave-one-out error and stability of learning algorithms with applications[J]. Advances in Learning Theory: Methods, Models and Applications, 2003, 190: 111-130.
[15] YEH I C, YANG K J, TING T M. Knowledge discovery on RFM model using Bernoulli sequence[J]. Expert Systems with Applications, 2009, 36(3): 5866-5871.
[16] National Aeronautics and Space Administration. Airfoil self-noise data set[EB/OL]. (2014-03-04) [2022-03-30]. https://archive.ics.uci.edu/ml/datasets/Air-foil+ Self-Noise.
[17] YEH I C, HSU T K. Building real estate valuation models with comparative approach through case-based reasoning[J]. Applied Soft Computing, 2018, 65: 260-271.
[18] KRISHNAN S, FRANKLIN M J, GOLDBERG K, et al. BoostClean: automated error detection and repair for machine learning[J]. arXiv:1711.01299, 2017.
[19] SUN X, LIU Y, LI J, et al. Using cooperative game theory to optimize the feature selection problem[J]. Neurocomputing, 2012, 97: 86-93.
[20] DENG X, PAPADIMITRIOU C H. On the complexity of cooperative solution concepts[J]. Mathematics of Operations Research, 1994, 19(2): 257-266.
[21] BACHRACH Y, MARKAKIS E, PROCACCIA A D, et al. Approximating power indices[C]//Proceedings of the 2008 International Joint Conference on Autonomous Agents and Multiagent Systems, Estoril, May 12-16, 2008. New York: ACM, 2008: 943-950.
[22] CHU X, MORCOS J, ILYAS I F, et al. KATARA: a data cleaning system powered by knowledge bases and crowd-sourcing[C]//Proceedings of the 2015 ACM SIGMOD Inter-national Conference on Management of Data, Melbourne, May 31-Jun 4, 2015. New York: ACM, 2015: 1247-1261.
[23] YAKOUT M, BERTI-éQUILLE L, ELMAGARMID A K. Don??t be scared: use scalable automatic repairing with max-imal likelihood and bounded changes[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, Jun 22-27, 2013. New York: ACM, 2013: 553-564.
[24] MAYFIELD C, NEVILLE J, PRABHAKAR S. ERACER: a database approach for statistical inference and data cleaning[C]//Proceedings of the 2010 ACM SIGMOD Inter-national Conference on Management of Data, Indianapolis, Jun 6-11, 2010. New York: ACM, 2010: 75-86.
[25] REKATSINAS T, CHU X, ILYAS I F, et al. Holo-Clean: holistic data repairs with probabilistic inference[J]. arXiv:1702.00820, 2017.
[26] BERGMAN M, MILO T, NOVGORODOV S, et al. Query-oriented data cleaning with oracles[C]//Proceedings of the 2015 ACM SIGMOD International Conference on Manage-ment of Data, Melbourne, May 31-Jun 4, 2015. New York: ACM, 2015: 1199-1214.
[27] KRISHNAN S, WANG J, FRANKLIN M J, et al. Sample-Clean: fast and reliable analytics on dirty data[J]. IEEE Data Engineering Bulletin, 2015, 38(3): 59-75.
[28] KRISHNAN S, WANG J, WU E, et al. ActiveClean: inter-active data cleaning for statistical modeling[J]. Proceedings of the VLDB Endowment, 2016, 9(12): 948-959.
[29] CHEN Y, HASSANI S H, KARBASI A, et al. Sequential information maximization: when is greedy near-optimal? [C]//Proceedings of the 28th Conference on Learning Theory, Paris, Jul 3-6, 2015: 338-363. |