Service Delivery

Using reinforcement algorithms to improve the collaboration efficiency of entrepreneurial teams.

Wang et al. (2026) · PLOS One

★ The Verdict

A self-updating point system can steer adult teams toward smoother talk and faster work.

✓ Read this if BCBAs who coach multidisciplinary teams or lead clinic staff meetings.

✗ Skip if Clinicians who only work one-to-one with early learners.

01Research in Context

What this study did

Wang et al. (2026) tested a smart computer coach for business teams. The coach uses a math tool called MARL-PPO. It watches how team members share work and gives points when they help each other.

The team ran three real cases. Each case lasted weeks. They tracked who finished tasks and how much money and time the team saved.

What they found

Every team got better at talking, finishing jobs, and using tools after the coach started. The coach changed its point rules on the fly to keep teamwork high.

No team lost ground when the coach stepped in. Work moved faster and people asked for help more often.

How this fits with other research

Bennett et al. (1973) gave plastic tokens to adults in a hospital when they talked nicely. Wang uses digital points for the same reason—reward the social act you want. The old study shows the idea works with real people long before computers.

Bonfonte et al. (2020) warns that new tokens can feel weak next to candy. Wang’s coach avoids this by letting the team pick what the points buy—like extra break time or first pick of tasks. The papers agree: value must feel real.

Allison (1976) tried three classroom token plans and saw equal gains. Wang adds a fourth plan—an algorithm that writes new rules each day. The new tool keeps the old lesson: any fair point system beats no system.

Why it matters

You already run token boards for kids. Now picture a token board that rewrites itself for your whole staff meeting. Let the team choose backup reinforcers before the algorithm starts. Track who shares materials, give points, and watch adult cooperation rise just like the kids’ hand raises.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Ask your team to list five backup reinforcers, then award points live during the next planning meeting.

02At a glance

Intervention

not applicable

Design

case study

Population

not specified

Finding

positive

03Original abstract

Entrepreneurial Team (ET) plays an essential role in the business process by driving innovation and optimizing ideas via adaptability, collaboration, and resourcefulness. The team performance is continuously affected because of resource imbalance, poor communication and inefficient task allocation. The importance of ET in organization growth is the main reason for this analysis. Therefore, this work uses Multi-Agent Reinforcement Learning (MARL) to handle efficient dynamic decisions and coordination to improve ET efficiency in dynamic and complex environments. The main intention of this work is to improve resource utilization, communication efficiency and optimize task allocation. During the analysis, Proximal Policy Optimization (PPO) is utilized to direct agents toward achieving collaborative goals. In every state, the agent receives rewards and penalties for their actions, which helps meet the organization’s goal with minimum time and improves the overall task completion rate. This process is evaluated using different case studies like software development, optimized manufacturing and logistic coordination, which helps to validate the system’s adaptability in various scenarios. In addition, different hypotheses are validated via case studies and metrics such as defect resolution, collaboration quality, operational efficiency, resource optimization, and task completion rate. Thus, the work highlights the impact of MARL in ET to ensure the highest performance in a dynamic environment.

PLOS One, 2026 · doi:10.1371/journal.pone.0343247