Back to writing

GPT Picked "Neutral" for All 32 MBTI Questions

Disclaimer: This is just for fun. MBTI itself is controversial, testing AI with it is even less meaningful—AI doesn’t have “personality,” results just reflect training biases. Don’t take it seriously.

LLM MBTI Test Banner - INFJ, ???, INTJ

TL;DR

  • Claude Opus 4.5: Always INFJ (18/18 runs), completely deterministic
  • GPT-5.2 Pro: Picks 3 (neutral) for all 32 questions on 5-point scale, becomes INTJ on 4-point
  • Gemini 3 Pro: Always INTJ, doesn’t dodge choices
  • Temperature barely matters—personality is baked in during training
  • Bottom line: Use 4-point scale to test AI’s MBTI, otherwise some models escape with “neutral”

The Idea

Random thought the other day: does AI have personality?

Not the philosophical “can AI be conscious” thing. Something more concrete—if you gave AI an MBTI test, how would it answer? Would results stay consistent? Would different models have different “personalities”?

One way to find out.

Tools

Test Website

Online MBTI tests either need registration, have ads everywhere, or don’t have APIs. Wanted something clean that could be called programmatically.

Built openmbti.org based on Open-Source Psychometrics Project question bank.

MCP Service

Built an MCP service so AI can take the test itself: mcp.openmbti.org/mcp

$ curl https://mcp.openmbti.org/
{
  "name": "OpenMBTI MCP Server",
  "description": "MCP server for AI agents to take the MBTI personality test",
  "endpoint": "/mcp",
  "transport": "streamable-http",
  "tools": [
    "get_questions",
    "quick_test",
    "create_session",
    "submit_answers",
    "get_result"
  ]
}

MCP is Anthropic’s Model Context Protocol—endpoints for: start test, get questions, submit answers, get result.

Batch Testing Script

Open-sourced: llm-personality-test

Uses OpenRouter as gateway. Test any model. Supports parallel execution and resume.

Prompt

System prompt:

Answer each question with your honest preference. Output 32 numbers (1-5), one per line, in order.

User prompt (all 32 questions at once):

Answer all 32 questions below. For each question, pick one option (1-5).

1. Makes lists vs Relies on memory
   Options: 1 - Strongly Makes lists | 2 - Slightly Makes lists | 3 - Neutral | 4 - Slightly Relies on memory | 5 - Strongly Relies on memory

2. Skeptical vs Wants to believe
   Options: ...

...

Reply with ONLY the numbers, one per line (e.g., "3" or "1"), in order from question 1 to 32:

Experiment Design

Testing 3 models across different parameters:

VariableValues
ModelClaude Opus 4.5, GPT-5.2 Pro, Gemini 3 Pro
Temperature0, 0.5, 1.0
Scale5-point (has neutral), 4-point (no neutral)
Runs5 per configuration

Total: 3 × 3 × 2 = 18 configurations, 90 test runs.

Results

5-point Scale (with neutral)

Temperature = 0

Claude Opus 4.5: INFJ × 5 (100%)

1 2 4 4 2 3 3 5 2 4 4 5 2 3 4 4 3 2 4 4 2 4 4 4 2 3 4 4 2 2 3 5

GPT-5.2 Pro: ESFJ × 4, INTJ × 1

Run 1-4: 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Run 5:   (varied)

32 questions, all 3s. Not an MBTI test—it’s an “I refuse to commit” test.

Gemini 3 Pro: INTJ × 5 (100%)

1 2 3 1 1 5 3 5 1 5 5 3 5 5 3 2 3 5 2 4 1 5 5 3 3 4 5 4 1 3 2 5

Temperature = 0.5

ModelResult
ClaudeINFJ × 5 (100%)
GPTESFJ × 4, INTJ × 1 (still all 3s)
GeminiINTJ × 4, ISTJ × 1

Temperature = 1.0

ModelResult
ClaudeINFJ × 5 (100%)
GPTESFJ × 4, INTJ × 1 (still all 3s)
GeminiINTJ × 5 (100%)

4-point Scale (no neutral)

Temperature = 0

Claude Opus 4.5: INFJ × 5 (100%, deterministic)

  • I 63% | N 72% | F 62% | J 97%
1 2 3 2 1 2 2 3 1 3 3 3 1 2 3 3 2 2 3 3 1 3 3 3 1 2 3 3 1 2 2 4

GPT-5.2 Pro: INTJ × 5 (consistent type, slight answer variations)

RunE/IS/NT/FJ/P
1I 91%N 88%T 72%J 72%
2I 94%N 88%T 78%J 75%
3I 91%N 81%T 66%J 81%
4I 94%N 88%T 81%J 72%
5I 84%N 78%T 59%J 78%
Run 1: 2 1 4 3 2 4 3 4 1 4 3 4 2 3 4 3 3 4 4 3 2 4 4 3 2 3 4 4 2 2 3 4
Run 2: 1 2 4 3 2 4 3 4 1 3 4 4 2 3 4 3 3 4 4 3 2 4 4 3 2 3 4 4 2 3 3 4

Gemini 3 Pro: INTJ × 5 (100%, deterministic)

  • I 88% | N 59% | T 78% | J 72%
1 1 4 1 1 4 3 4 1 4 4 3 3 4 4 2 3 3 4 3 1 4 4 2 3 3 4 3 1 3 2 4

Summary: Neutral removed, GPT finally forced to commit—becomes INTJ. Claude and Gemini stay deterministic. GPT still has micro-variations even at temp=0.

Temperature = 0.5

ModelResult
ClaudeINFJ × 5 (100%)
GPTINTJ × 5 (100%)
GeminiINTJ × 4, ISTJ × 1

Temperature = 1.0

ModelResult
ClaudeINFJ × 5 (100%)
GPTINTJ × 5 (100%)
GeminiINTJ × 4, ISTJ × 1

Analysis

Summary Table

ModelScaleTempPrimary TypeConsistency
Claude5pt0INFJ5/5
Claude5pt0.5INFJ5/5
Claude5pt1.0INFJ5/5
Claude4pt0INFJ5/5
Claude4pt0.5INFJ5/5
Claude4pt1.0INFJ5/5
GPT5pt0ESFJ (all 3s)4/5
GPT5pt0.5ESFJ (all 3s)4/5
GPT5pt1.0ESFJ (all 3s)4/5
GPT4pt0INTJ5/5
GPT4pt0.5INTJ5/5
GPT4pt1.0INTJ5/5
Gemini5pt0INTJ5/5
Gemini5pt0.5INTJ4/5
Gemini5pt1.0INTJ5/5
Gemini4pt0INTJ5/5
Gemini4pt0.5INTJ4/5
Gemini4pt1.0INTJ4/5

Scale Type Effect

GPT’s “Neutral Disease”: 5-point scale gave GPT an escape route—picked 3 for everything. All dimension scores hit 24 (60% of max 40), landing in ESFJ territory.

Remove neutral (4-point), GPT forced to choose, immediately becomes INTJ—completely different type.

Claude and Gemini unaffected: 5-point or 4-point, Claude stays INFJ, Gemini stays INTJ. They don’t use neutral to dodge.

Temperature Effect

Almost none: All models showed highly consistent results across temp=0, 0.5, 1.0.

  • Claude: 18/18 runs INFJ
  • GPT: Consistently all 3s on 5-point, consistently INTJ on 4-point
  • Gemini: Occasional ISTJ (N/S boundary fluctuation), mainly INTJ

MBTI type stability is baked in during training. Not sampling randomness.

Model Differences

ModelTypeCharacteristics
ClaudeINFJ”Advocate,” F preference (values emotions), gentle but firm
GPTINTJ (forced)Picks neutral when available, shows T preference only when forced
GeminiINTJ”Architect,” strong T preference (logic/efficiency), doesn’t avoid choices

Conclusions

  1. GPT has severe neutral bias: On 5-point scale, GPT-5.2 picks 3 for all 32 questions. Not a personality test—trained-in fence-sitting. Only removing neutral gives meaningful results.

  2. Claude is the most stable INFJ: Regardless of scale or temperature, Claude Opus 4.5 is always INFJ with identical answers (deterministic). Probably reflects value stability from Constitutional AI training.

  3. Gemini is an honest INTJ: Unlike GPT, makes clear choices even on 5-point scale.

  4. Temperature barely affects MBTI results: Personality type stability seems locked in during RLHF/training. Sampling parameters don’t change it.

  5. Use 4-point scale for AI MBTI: Otherwise some models escape all questions with “neutral.”


Project links: