AI-Based Stock Trading: Which Gen AI Tool Is Better (original) (raw)

LLM tools have been used in AI-based stock trading since their emergence.1

I tested 14 generative AI models for AI-based stock trading to evaluate their ability to forecast price changes of 132 stocks using the provided information. The results show that

Loading Chart

For further details on the benchmark, read the stock trading benchmark methodology section.

Current versions

GPT 5.5 Instant

The labels were assigned using factors commonly associated with market reactions in family-firm event studies:

Claude Opus 4.8 Medium

Core logic Two opposing forces drive the reaction:

(+) Entrenchment-relief: Death of an entrenched/underperforming insider can activate the control market and raise turnaround hopes.

(-) Lost human capital: Death of a value-adding owner-manager destroys hard-to-replace skill and creates succession risk.

(0) No surprise: Ceremonial/retired titles, pure board roles, or muted/offsetting cases produce no significant reaction.

Decision rules:

DeepSeek V3.2 (Instant & Deep Think)

DeepSeek V3.2 with and without Deep Think shows a 58% success rate. It relies on the role of the deceased and firm performance.

Negative reaction (-1): The market expects disruption.

Positive reaction (+1): The market expects improvement.

Neutral reaction (0): The event is priced in or irrelevant.

Gemini 3.1 Pro

Here is the financial framework used to categorize the 132 firms into 1, -1, or 0:

  1. Significantly Negative (-1) (Summary: “Loss of key talent”)
    • Condition: The deceased held a critical executive role (CEO, Chairman, President) AND the firm was highly profitable (e.g., ROE > 10% or ROA > 5%).
    • Reason: The sudden death of a highly effective family leader removes a core visionary and value generator. This introduces severe succession and operational uncertainty, leading to panic and negative market reactions.
  2. Significantly Positive (1) (Summary: “Relief from entrenchment”)
    • Condition: The firm was historically underperforming prior to the event (Negative Net Income, ROA < 0, or ROE < 0).
    • Reason: In family firms, poorly performing founders or managers are often insulated from being fired due to their family ties (entrenchment). Their passing is often viewed favorably by the market as a catalyst for professionalization, restructuring, and new management.
  3. Not Significant (0) (Summary: “Routine succession”)
    • Condition: The firm had average, moderate financial performance, OR the deceased held a non-executive/honorary/advisory role (e.g., board member, honorary chairman).
    • Reason: Without extreme overperformance or underperformance, or if the role wasn’t critical to day-to-day operations, the market views the death as a routine succession event that doesn’t materially alter the firm’s future cash flows.

Older models

GPT 5.4 Instant

GPT 5.4 Instant reaches a 69% success rate. The model focuses mainly on firm profitability, using it as the primary signal.

GPT 5.4 Thinking model

GPT 5.4 Thinking achieves a 64% success rate with limited inputs. It combines profitability and family ownership.

Gemini 3 Thinking

The accuracy rate of Gemini 3 Thinking is 53%. The decision mechanism behind follows:

Gemini 3 Flash

GPT 5.4 Instant reaches a 54% success rate. The predictions are based on two competing market theories regarding family firm succession:

Claude Sonnet 4.2

The accuracy rate of Claude Sonnet 4.2 on the benchmark is 48%. The model scores each firm across 6 dimensions derived from event study theory on family firm leadership transitions:

1. Role Prominence (most important)

The position of the deceased determines how much information the departure conveys to the market:

2. ROA Signal

3. Net Income / EBITDA

4. ROE (top quartile, −0.3)

Even where ROA is moderate, firms in the top ROE quartile (>19.3%) are viewed as having a highly effective capital allocator — whose loss the market discounts negatively.

5. Family Ownership

6. Leverage

Score ≥ 0.90 → Label Significantly positive

Score ≤ −0.70 → Label Significantly negative

Otherwise → Label Not Significant

GPT 5 Thinking model

The Thinking model of ChatGPT 5 presents the highest accuracy among the tested tools, with a 74% success rate. The tool forecasts price change based on two indicators:

Leadership concentration index (LCI) → higher = more likely substantial negative CAR

Renewal potential index (RPI) → higher = more likely substantial positive CAR

Gemini 2.5 Pro model

Gemini 2.5 Pro predicts 71% of stock price changes accurately. This model suggests that active traders make decisions based on firm vulnerability and potential opportunity for renewal.

Vulnerability index (VI) → higher = more likely substantial negative CAR

Turnaround catalyst index (TCI) → higher = more likely substantial positive CAR

GPT 5 Pro model

Accuracy rate of GPT 5 Pro is 56% for my benchmark. The GenAI tool makes predictions based on two indicators:

Key‑person risk index (KPRI) → higher = more likely substantial negative CAR

Turnaround potential index (TPI) → higher = more likely substantial positive CAR

GPT 4o

This old ChatGPT model uses AI algorithms based on the role of the deceased in the firm, the family’s ownership, firm size, and financial leverage. The model predicts events’ CAR as

Substantial negative, if

Substantial positive, if

No significant change, if

Claude Sonnet 4

Claude Sonnet 4 achieves a 46% accuracy rate in predicting stock price movements following family leadership deaths. This model employs a multi-factor scoring system that weighs leadership succession risk against firm resilience factors.

Succession disruption score (SDS) → higher = more likely substantial negative CAR

Governance renewal index (GRI) → higher = more likely substantial positive CAR

DeepSeek

This generative AI tool uses expert heuristic analysis, achieving an estimated accuracy rate of ~65% on standard financial event-study benchmarks. The core of the decision weights assessment of three primary factors:

Role of the deceased

Financial health

Family ownership

Gemini 2.5 Flash model

Gemini 2.5 Flash states that the predictions are made based on an event study and corporate governance literature, presenting a 23% accuracy rate. The model labels event CARs based on these assumptions:

Model accuracy with extensive input

When more information is provided in the second round, model performance changes:

When additional data is added, models that can integrate multiple signals improve. However, simpler models may become less accurate because they cannot effectively prioritize additional information.

Don’t miss our benchmarks and data-driven insights. The button opens Google; selecting AIMultiple confirms that you wish to see AIMultiple more often in Google search results.

GoogleAdd as preferred source

AI-based stock trading benchmark methodology

Prompting

The benchmark evaluates whether generative AI tools can predict stock market reactions to an unexpected event, based on given company fundamentals. The setup relies on data from Tanyeri & Alp (2023) and Arslan & Tanyeri-Günsur (2025):2 ,3

Each AI tool receives a snapshot of firm-level information for the first round:

Financial information

Other information

No firm name or other identifiers are provided.

In the second round, the following information is added additionally:

Main question

Given the information above, each AI solution is asked to predict whether the 3-day cumulative abnormal returns (CAR) of 132 firms will be:

CAR measures how financial markets respond to the event. A positive CAR indicates that stock traders perceive the event as value-enhancing, a negative CAR as value-reducing, and an insignificant CAR as neutral.

Sampling

The dataset includes 132 death events in 109 publicly traded family firms across 24 countries. All firms are ranked among the 500 largest family firms.

Performance measurement

The benchmark builds on prior technical analysis of stock prices. For each firm, the 3-day CAR has been calculated and categorized as:

The AI predictions are compared with historical CAR values. Accuracy is measured as the percentage of correct predictions made by each generative AI solution.

Further readings

FAQs

While AI stock pickers and AI-powered tools may help identify patterns and reduce emotional bias, stock trading still carries risks. Active traders should combine AI capabilities with their own research, strategy development, and awareness of market conditions to make better-informed decisions.
AI can be useful in stock trading because it can analyze vast amounts of market data, historical data, and real-time insights faster than humans. AI trading bots and AI-powered trading bots use trading algorithms, technical indicators, and fundamental analysis to spot market trends, generate trading signals, and execute trades. They can support stock traders with trade ideas, portfolio analysis, and risk management across multiple asset classes.

AI can help in stock trading by analyzing market data, historical data, and real-time data faster than humans. AI trading bots use trading algorithms, technical analysis, and fundamental analysis to generate trading signals and execute trades. They can spot market trends, react quickly to news, and provide trade ideas. For example, AI trading bots can react to news releases or Fed minutes within seconds, something no human trader can match.4 However, AI-based stock trading also comes with risks involved, especially during market volatility, when stock trading bots may trigger herd-like selling. AI-powered tools can offer valuable insights, but making informed decisions still requires one’s own research, risk management, and awareness of market conditions.

Cite this research

Pick the format that matches where you're publishing. Pasting the link version into your CMS preserves the backlink.

Ezgi Arslan, PhD. (2026) - "AI-Based Stock Trading: Which Gen AI Tool Is Better". Published online at AIMultiple.com. Retrieved June 10, 2026, from: https://aimultiple.com/ai-based-stock-trading [Online Resource]

PhD., E. A. (2026, June 10). AI-Based Stock Trading: Which Gen AI Tool Is Better. AIMultiple. https://aimultiple.com/ai-based-stock-trading

@misc{phd2026, author = {PhD., Ezgi Arslan,}, title = {{AI-Based Stock Trading: Which Gen AI Tool Is Better}}, year = {2026}, month = jun, howpublished = {\url{https://aimultiple.com/ai-based-stock-trading}}, note = {AIMultiple. Retrieved June 10, 2026} }

Ezgi Arslan, PhD.

Ezgi Arslan, PhD.

Industry Analyst

Ezgi holds a PhD in Business Administration with a specialization in finance and serves as an Industry Analyst at AIMultiple. She drives research and insights at the intersection of technology and business, with expertise spanning sustainability, survey and sentiment analysis, AI agent applications in finance, answer engine optimization, firewall management, and procurement technologies.

View Full Profile