Strategy Selection and Outcome Evaluation of Three-Way Decisions Based on Reinforcement Learning

doi:10.3778/j.issn.1673-9418.2210090

Abstract

Abstract: The trisecting-acting-outcome (TAO) model of three-way decision (3WD) consists of three steps: trisect a whole, design action strategies, and outcome analysis and measurement. Currently, research on outcome evaluation aims to measure the pre- and post-change in outcomes following the implementation of strategies, and it is still unable to predict which strategy will achieve the maximum effect. To narrow down this gap, this paper focuses on the “acting” and “outcome” of the TAO model and introduces a method for strategy selection and outcome prediction for the change-based three-way decision based on Q-learning in reinforcement learning. Firstly, the approach is to treat the altered tri-partition and the acting in the change-based three-way decision TAO model as states and actions in reinforcement learning, respectively, and to consider the process of obtaining a newly altered tri-partition each time under the acting of action or strategy as a cycle. The reward generated by each cycle is calculated using cumulative prospect theory, and the interaction process between the agent and the environment is represented by a Markov decision process. Secondly, a target reward is set, and the state when the cumulative reward of each cycle reaches the target reward is taken as the termination state of the Markov decision process. Then a Q-learning algorithm is used to iterate a set of actions that achieve the target reward in the shortest cycle and then the action set is used to predict the future utility of the change-based three-way decision. Finally, an example is employed to illustrate the applicability and effectiveness of the method.

Key words: three-way decision, change-based three-way decision, reinforcement learning, strategy selection, outcome evaluation

摘要： 三支决策的“分、治、效”（TAO）模型包括构建三分、施加策略、结果评估三个部分。目前，关于结果评估的研究旨在衡量策略施加后结果的前后变化，还无法预测施加哪个策略能达到最大效果。为了解决这一问题，对TAO模型的“治”和“效”进行了研究，提出一种基于强化学习的三支改变模型的策略选择与有效性预测的方法。首先将改变三支决策TAO模型中的改变三分状态和策略分别作为强化学习中的状态和动作，并将每次施加策略得到新的改变三分状态的过程看作一个周期，利用累积前景理论计算每个周期产生的奖励，将智能体与环境的交互过程用马尔可夫决策过程来表示；其次设置一个目标奖励，将各个周期的累计奖励达到目标奖励时的状态作为马尔可夫决策过程的终止状态；然后用Q-learning算法迭代出一个最短周期内达到目标奖励的策略序列，同时利用该策略序列预测当前改变三分状态的未来效用。最后使用一个实例体现出该方法实用性和有效性。

关键词: 三支决策, 改变三支决策, 强化学习, 策略选择, 效用度量

LIU Xiaoxue, JIANG Chunmao. Strategy Selection and Outcome Evaluation of Three-Way Decisions Based on Reinforcement Learning[J]. Journal of Frontiers of Computer Science and Technology, 2024, 18(2): 378-386.

刘晓雪, 姜春茂. 融合强化学习的三支治略选择及其有效性分析[J]. 计算机科学与探索, 2024, 18(2): 378-386.

References

[1] YAO Y Y. Three-way decision: an interpretation of rules in rough set theory[C]//LNCS 5589: Proceedings of the 4th International Conference on Rough Sets and Knowledge Technology, Gold Coast, Jul 14-16, 2009. Berlin, Heidelberg: Springer, 2009: 642-649.
[2] YAO Y Y. Three-way decisions with probabilistic rough sets[J]. Information Sciences, 2010, 180(3): 341-353.
[3] YAO Y Y. An outline of a theory of three-way decisions[C]//LNCS 7413: Proceedings of the 8th International Conference on Rough Sets and Current Trends in Computing, Chengdu, Aug 17-20, 2012. Berlin, Heidelberg: Springer, 2012: 1-17.
[4] YAO Y Y. The geometry of three-way decision[J]. Applied Intelligence, 2021, 51(9): 6298-6325.
[5] FANG Y, MIN F. Cost-sensitive approximate attribute reduction with three-way decisions[J]. International Journal of Approximate Reasoning, 2019, 104: 148-165.
[6] JIA X Y, LIAO W H, TANG Z M, et al. Minimum cost attribute reduction in decision-theoretic rough set models[J]. Information Sciences, 2013, 219: 151-167.
[7] MIN F, ZHU W. Attribute reduction of data with error ranges and test costs[J]. Information Sciences, 2012, 211: 48-67.
[8] JIA X Y, SHANG L, ZHOU B, et al. Generalized attribute reduce in rough set theory[J]. Knowledge-Based Systems, 2016, 91: 204-218.
[9] MA X A, ZHAO X R. Cost-sensitive three-way class-specific attribute reduction[J]. International Journal of Approximate Reasoning, 2019, 105: 153-174.
[10] QIAN J, DANG C Y, YUE X D, et al. Attribute reduction for sequential three-way decisions under dynamic granulation[J]. International Journal of Approximate Reasoning, 2017, 85: 196-216.
[11] ZHANG X Y, YANG J L, TANG L Y. Three-way class-specific attribute reducts from the information viewpoint[J]. Information Sciences, 2020, 507: 840-872.
[12] YAO Y Y. Granular computing and sequential three-way decisions[C]//LNCS 8171: Proceedings of the 8th International Conference on Rough Sets and Knowledge Technology, Halifax, Oct 11-14, 2013. Berlin, Heidelberg: Springer, 2013: 16-27.
[13] ZHANG Q H, PANG G H, WANG G Y. A novel sequential three-way decisions model based on penalty function[J]. Knowledge-Based Systems, 2020, 192: 105350.
[14] YANG X, LI T, FUJITA H, et al. A sequential three-way approach to multi-class decision[J]. International Journal of Approximate Reasoning, 2019, 104: 108-125.
[15] QIAN J, LIU C H, MIAO D Q, et al. Sequential three-way decisions via multi-granularity[J]. Information Sciences, 2020, 507: 606-629.
[16] ZHANG L B, LI H X, ZHOU X Z, et al. Sequential three-way decision based on multi-granular autoencoder features[J]. Information Sciences, 2020, 507: 630-643.
[17] JU H R, PEDRYCZ W, LI H X, et al. Sequential three-way classifier with justifiable granularity[J]. Knowledge-Based Systems, 2019, 163: 103-119.
[18] YU H, WANG X C, WANG G Y, et al. An active three-way clustering method via low-rank matrices for multi-view data[J]. Information Sciences, 2020, 507: 823-839.
[19] WANG P X, YAO Y Y. CE3: a three-way clustering method based on mathematical morphology[J]. Knowledge-Based Systems, 2018, 155: 54-65.
[20] YU H, ZHANG C, WANG G Y. A tree-based incremental overlapping clustering method using the three-way decision theory[J]. Knowledge-Based Systems, 2016, 91: 189-203.
[21] AFRIDI M K, AZAM N, YAO J T, et al. A three-way clustering approach for handling missing data using GTRS[J]. International Journal of Approximate Reasoning, 2018, 98: 11-24.
[22] JIANG C M, YAO Y Y. Effectiveness measures in movement-based three-way decisions[J]. Knowledge-Based Systems, 2018, 160: 136-143.
[23] JIANG C M, GUO D D, DUAN Y, et al. Strategy selection under entropy measures in movement-based three-way decision[J]. International Journal of Approximate Reasoning, 2020, 119: 280-291.
[24] JIANG C M, GUO D D, XU R Y. Measuring the outcome of movement-based three-way decision using proportional utility functions[J]. Applied Intelligence, 2021, 51(12): 8598-8612.
[25] QI J J, QIAN T, WEI L. The connections between three-way and classical concept lattices[J]. Knowledge-Based Systems, 2016, 91: 143-151.
[26] WEI L, LIU L, QI J J, et al. Rules acquisition of formal decision contexts based on three-way concept lattices[J]. Information Sciences, 2020, 516: 529-544.
[27] YANG B, LI J H. Complex network analysis of three-way decision researches[J]. International Journal of Machine Lear-ning and Cybernetics, 2020, 11: 973-987.
[28] LIU D, LIANG D C, WANG C C. A novel three-way decision model based on incomplete information system[J]. Knowledge-Based Systems, 2016, 91: 32-45.
[29] LI J H, HUANG C C, QI J J, et al. Three-way cognitive concept learning via multi-granularity[J]. Information Sciences, 2017, 378: 244-263.
[30] YAO Y Y. Three-way decision and granular computing[J]. International Journal of Approximate Reasoning, 2018, 103: 107-123.
[31] GAO C, YAO Y Y. Actionable strategies in three-way decisions[J]. Knowledge-Based Systems, 2017, 133: 141-155.
[32] JIANG C M, ZHAO S B. Action strategy analysis in probabilistic preference movement-based three-way decision[J]. Mathematical Problems in Engineering, 2020. DOI: 10.1155/ 2020/5436507.
[33] 郭豆豆, 姜春茂. 基于 M-3WD 的多阶段区域转化策略研究[J]. 计算机科学, 2019, 46(10): 279-285.
GUO D D, JIANG C M. Multi-stage regional transformation strategy in movement-based three-way decision model[J]. Computer Science, 2019, 46(10): 279-285.
[34] JIANG C M, GUO D D, SUN L J. Effectiveness measure for TAO model of three-way decisions with interval set[J]. Journal of Intelligent & Fuzzy Systems, 2021, 40(6): 11071-11084.
[35] GUO D D, JIANG C M, SHENG R X, et al. a novel outcome evaluation model of three-way decision: a change viewpoint[J]. Information Sciences, 2022, 607: 1089-1110.
[36] JIANG C M, GUO D D, DUAN Y. Measure effectiveness of change-based three-way decision using utility theory[J]. Cognitive Computation, 2022, 14(3): 1009-1018.
[37] KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: a survey[J]. Journal of Artificial Intelligence Research, 1996, 4: 237-285.
[38] WATJINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3): 279-292.
[39] WANG T X, LI H X, ZHANG L B, et al. A three-way decision model based on cumulative prospect theory[J]. Information Sciences, 2020, 519: 74-92.
[40] TVERSKY A, KAHNEMAN D. Advances in prospect theory: cumulative representation of uncertainty[J]. Journal of Risk and Uncertainty, 1992, 5(4): 297-323.