Using reinforcement algorithms to improve the collaboration efficiency of entrepreneurial teams.
A self-updating point system can steer adult teams toward smoother talk and faster work.
01Research in Context
What this study did
Wang et al. (2026) tested a smart computer coach for business teams. The coach uses a math tool called MARL-PPO. It watches how team members share work and gives points when they help each other.
The team ran three real cases. Each case lasted weeks. They tracked who finished tasks and how much money and time the team saved.
What they found
Every team got better at talking, finishing jobs, and using tools after the coach started. The coach changed its point rules on the fly to keep teamwork high.
No team lost ground when the coach stepped in. Work moved faster and people asked for help more often.
How this fits with other research
Bennett et al. (1973) gave plastic tokens to adults in a hospital when they talked nicely. Wang uses digital points for the same reason—reward the social act you want. The old study shows the idea works with real people long before computers.
Bonfonte et al. (2020) warns that new tokens can feel weak next to candy. Wang’s coach avoids this by letting the team pick what the points buy—like extra break time or first pick of tasks. The papers agree: value must feel real.
Allison (1976) tried three classroom token plans and saw equal gains. Wang adds a fourth plan—an algorithm that writes new rules each day. The new tool keeps the old lesson: any fair point system beats no system.
Why it matters
You already run token boards for kids. Now picture a token board that rewrites itself for your whole staff meeting. Let the team choose backup reinforcers before the algorithm starts. Track who shares materials, give points, and watch adult cooperation rise just like the kids’ hand raises.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Ask your team to list five backup reinforcers, then award points live during the next planning meeting.
02At a glance
03Original abstract
Entrepreneurial Team (ET) plays an essential role in the business process by driving innovation and optimizing ideas via adaptability, collaboration, and resourcefulness. The team performance is continuously affected because of resource imbalance, poor communication and inefficient task allocation. The importance of ET in organization growth is the main reason for this analysis. Therefore, this work uses Multi-Agent Reinforcement Learning (MARL) to handle efficient dynamic decisions and coordination to improve ET efficiency in dynamic and complex environments. The main intention of this work is to improve resource utilization, communication efficiency and optimize task allocation. During the analysis, Proximal Policy Optimization (PPO) is utilized to direct agents toward achieving collaborative goals. In every state, the agent receives rewards and penalties for their actions, which helps meet the organization’s goal with minimum time and improves the overall task completion rate. This process is evaluated using different case studies like software development, optimized manufacturing and logistic coordination, which helps to validate the system’s adaptability in various scenarios. In addition, different hypotheses are validated via case studies and metrics such as defect resolution, collaboration quality, operational efficiency, resource optimization, and task completion rate. Thus, the work highlights the impact of MARL in ET to ensure the highest performance in a dynamic environment.
PLOS One, 2026 · doi:10.1371/journal.pone.0343247