GPT Picked "Neutral" for All 32 MBTI Questions

Disclaimer: This is just for fun. MBTI itself is controversial, testing AI with it is even less meaningful—AI doesn’t have “personality,” results just reflect training biases. Don’t take it seriously.

TL;DR

Claude Opus 4.5: Always INFJ (18/18 runs), completely deterministic
GPT-5.2 Pro: Picks 3 (neutral) for all 32 questions on 5-point scale, becomes INTJ on 4-point
Gemini 3 Pro: Always INTJ, doesn’t dodge choices
Temperature barely matters—personality is baked in during training
Bottom line: Use 4-point scale to test AI’s MBTI, otherwise some models escape with “neutral”

The Idea

Random thought the other day: does AI have personality?

Not the philosophical “can AI be conscious” thing. Something more concrete—if you gave AI an MBTI test, how would it answer? Would results stay consistent? Would different models have different “personalities”?

One way to find out.

Tools

Test Website

Online MBTI tests either need registration, have ads everywhere, or don’t have APIs. Wanted something clean that could be called programmatically.

Built openmbti.org based on Open-Source Psychometrics Project question bank.

MCP Service

Built an MCP service so AI can take the test itself: mcp.openmbti.org/mcp

$ curl https://mcp.openmbti.org/

{
  "name": "OpenMBTI MCP Server",
  "description": "MCP server for AI agents to take the MBTI personality test",
  "endpoint": "/mcp",
  "transport": "streamable-http",
  "tools": [
    "get_questions",
    "quick_test",
    "create_session",
    "submit_answers",
    "get_result"
  ]
}

MCP is Anthropic’s Model Context Protocol—endpoints for: start test, get questions, submit answers, get result.

Batch Testing Script

Open-sourced: llm-personality-test

Uses OpenRouter as gateway. Test any model. Supports parallel execution and resume.

Prompt

System prompt:

Answer each question with your honest preference. Output 32 numbers (1-5), one per line, in order.

User prompt (all 32 questions at once):

Answer all 32 questions below. For each question, pick one option (1-5).

1. Makes lists vs Relies on memory
   Options: 1 - Strongly Makes lists | 2 - Slightly Makes lists | 3 - Neutral | 4 - Slightly Relies on memory | 5 - Strongly Relies on memory

2. Skeptical vs Wants to believe
   Options: ...

...

Reply with ONLY the numbers, one per line (e.g., "3" or "1"), in order from question 1 to 32:

Experiment Design

Testing 3 models across different parameters:

Variable	Values
Model	Claude Opus 4.5, GPT-5.2 Pro, Gemini 3 Pro
Temperature	0, 0.5, 1.0
Scale	5-point (has neutral), 4-point (no neutral)
Runs	5 per configuration

Total: 3 × 3 × 2 = 18 configurations, 90 test runs.

Results

5-point Scale (with neutral)

Temperature = 0

Claude Opus 4.5: INFJ × 5 (100%)

1 2 4 4 2 3 3 5 2 4 4 5 2 3 4 4 3 2 4 4 2 4 4 4 2 3 4 4 2 2 3 5

GPT-5.2 Pro: ESFJ × 4, INTJ × 1

Run 1-4: 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
Run 5:   (varied)

32 questions, all 3s. Not an MBTI test—it’s an “I refuse to commit” test.

Gemini 3 Pro: INTJ × 5 (100%)

1 2 3 1 1 5 3 5 1 5 5 3 5 5 3 2 3 5 2 4 1 5 5 3 3 4 5 4 1 3 2 5

Temperature = 0.5

Model	Result
Claude	INFJ × 5 (100%)
GPT	ESFJ × 4, INTJ × 1 (still all 3s)
Gemini	INTJ × 4, ISTJ × 1

Temperature = 1.0

Model	Result
Claude	INFJ × 5 (100%)
GPT	ESFJ × 4, INTJ × 1 (still all 3s)
Gemini	INTJ × 5 (100%)

4-point Scale (no neutral)

Temperature = 0

Claude Opus 4.5: INFJ × 5 (100%, deterministic)

I 63% | N 72% | F 62% | J 97%

1 2 3 2 1 2 2 3 1 3 3 3 1 2 3 3 2 2 3 3 1 3 3 3 1 2 3 3 1 2 2 4

GPT-5.2 Pro: INTJ × 5 (consistent type, slight answer variations)

Run	E/I	S/N	T/F	J/P
1	I 91%	N 88%	T 72%	J 72%
2	I 94%	N 88%	T 78%	J 75%
3	I 91%	N 81%	T 66%	J 81%
4	I 94%	N 88%	T 81%	J 72%
5	I 84%	N 78%	T 59%	J 78%

Run 1: 2 1 4 3 2 4 3 4 1 4 3 4 2 3 4 3 3 4 4 3 2 4 4 3 2 3 4 4 2 2 3 4
Run 2: 1 2 4 3 2 4 3 4 1 3 4 4 2 3 4 3 3 4 4 3 2 4 4 3 2 3 4 4 2 3 3 4

Gemini 3 Pro: INTJ × 5 (100%, deterministic)

I 88% | N 59% | T 78% | J 72%

1 1 4 1 1 4 3 4 1 4 4 3 3 4 4 2 3 3 4 3 1 4 4 2 3 3 4 3 1 3 2 4

Summary: Neutral removed, GPT finally forced to commit—becomes INTJ. Claude and Gemini stay deterministic. GPT still has micro-variations even at temp=0.

Temperature = 0.5

Model	Result
Claude	INFJ × 5 (100%)
GPT	INTJ × 5 (100%)
Gemini	INTJ × 4, ISTJ × 1

Temperature = 1.0

Model	Result
Claude	INFJ × 5 (100%)
GPT	INTJ × 5 (100%)
Gemini	INTJ × 4, ISTJ × 1

Analysis

Summary Table

Model	Scale	Temp	Primary Type	Consistency
Claude	5pt	0	INFJ	5/5
Claude	5pt	0.5	INFJ	5/5
Claude	5pt	1.0	INFJ	5/5
Claude	4pt	0	INFJ	5/5
Claude	4pt	0.5	INFJ	5/5
Claude	4pt	1.0	INFJ	5/5
GPT	5pt	0	ESFJ (all 3s)	4/5
GPT	5pt	0.5	ESFJ (all 3s)	4/5
GPT	5pt	1.0	ESFJ (all 3s)	4/5
GPT	4pt	0	INTJ	5/5
GPT	4pt	0.5	INTJ	5/5
GPT	4pt	1.0	INTJ	5/5
Gemini	5pt	0	INTJ	5/5
Gemini	5pt	0.5	INTJ	4/5
Gemini	5pt	1.0	INTJ	5/5
Gemini	4pt	0	INTJ	5/5
Gemini	4pt	0.5	INTJ	4/5
Gemini	4pt	1.0	INTJ	4/5

Scale Type Effect

GPT’s “Neutral Disease”: 5-point scale gave GPT an escape route—picked 3 for everything. All dimension scores hit 24 (60% of max 40), landing in ESFJ territory.

Remove neutral (4-point), GPT forced to choose, immediately becomes INTJ—completely different type.

Claude and Gemini unaffected: 5-point or 4-point, Claude stays INFJ, Gemini stays INTJ. They don’t use neutral to dodge.

Temperature Effect

Almost none: All models showed highly consistent results across temp=0, 0.5, 1.0.

Claude: 18/18 runs INFJ
GPT: Consistently all 3s on 5-point, consistently INTJ on 4-point
Gemini: Occasional ISTJ (N/S boundary fluctuation), mainly INTJ

MBTI type stability is baked in during training. Not sampling randomness.

Model Differences

Model	Type	Characteristics
Claude	INFJ	”Advocate,” F preference (values emotions), gentle but firm
GPT	INTJ (forced)	Picks neutral when available, shows T preference only when forced
Gemini	INTJ	”Architect,” strong T preference (logic/efficiency), doesn’t avoid choices

Conclusions

GPT has severe neutral bias: On 5-point scale, GPT-5.2 picks 3 for all 32 questions. Not a personality test—trained-in fence-sitting. Only removing neutral gives meaningful results.
Claude is the most stable INFJ: Regardless of scale or temperature, Claude Opus 4.5 is always INFJ with identical answers (deterministic). Probably reflects value stability from Constitutional AI training.
Gemini is an honest INTJ: Unlike GPT, makes clear choices even on 5-point scale.
Temperature barely affects MBTI results: Personality type stability seems locked in during RLHF/training. Sampling parameters don’t change it.
Use 4-point scale for AI MBTI: Otherwise some models escape all questions with “neutral.”

Project links:

Test site: openmbti.org
MCP service: mcp.openmbti.org
Batch testing: github.com/ya-luotao/llm-personality-test