The most widely used intelligence test in human history was designed in 1905 by a French psychologist named Alfred Binet. He had a narrow purpose: to identify Parisian schoolchildren who needed extra academic support (Binet & Simon, Méthodes nouvelles pour le diagnostic du niveau intellectuel des anormaux, L’Année Psychologique, 1905). Binet was explicit about what his test could not do. It could not measure innate intelligence. It could not rank human beings on a fixed scale of cognitive worth. It could not, and should not, be used to label anyone as permanently inferior. He said so in writing. He warned against it. He called the idea of a fixed, hereditary intelligence “brutal pessimism” (Binet, Les idées modernes sur les enfants, 1909).

Within a decade, American psychologists ignored every one of his warnings and turned his diagnostic tool into a weapon.

Lewis Terman at Stanford University took Binet’s test, anglicized it, standardized it on white, English-speaking, middle-class Californians, and renamed it the Stanford-Binet Intelligence Scale in 1916. He then used it to argue that intelligence was hereditary and that certain racial groups were genetically inferior. His words, published in an academic textbook: “Their dullness seems to be racial, or at least inherent in the family stocks from which they come” (Terman, The Measurement of Intelligence, Houghton Mifflin, 1916, p. 91–92). He was talking about Black, Mexican, and Indigenous Americans.

Free from the Publisher

Solutions Start With Your Strongest Asset — Your Mind

Building solutions requires cognitive strength. Discover your profile. 10 questions. 3 minutes. No payment required.

Start Free Assessment →

23,847 people have taken this assessment

Henry Goddard, the psychologist who introduced Binet’s test to America, used it to classify immigrants arriving at Ellis Island. He administered the test — in English — to people who did not speak English, then declared that 83 percent of Jews, 80 percent of Hungarians, and 79 percent of Italians were “feeble-minded” (Goddard, “Mental Tests and the Immigrant,” Journal of Delinquency, 1917). His data was used to support the Immigration Act of 1924, which restricted entry from Southern and Eastern Europe.

Carl Brigham, a Princeton psychologist, took the Army mental tests administered to 1.75 million soldiers during World War I and published A Study of American Intelligence in 1923. His conclusion: Nordic races were intellectually superior, and the “intellectual deterioration” of America was being caused by racial mixing with inferior groups. Brigham later created the Scholastic Aptitude Test — the SAT — which became the primary gatekeeper for college admissions in the United States (Brigham, A Study of American Intelligence, Princeton University Press, 1923). He eventually recanted his racial conclusions in 1930, calling his own earlier work “without foundation.” The test he built on those foundations continued anyway.

The IQ test was designed for French schoolchildren in 1905. Its creator explicitly warned it could not measure innate intelligence. American psychologists ignored that warning and used the test to justify eugenics, forced sterilization, and racial segregation for the next century.

Binet & Simon, 1905; Terman, 1916; Goddard, 1917; Brigham, 1923

This is not ancient history. The forced sterilization programs justified by IQ test results continued in the United States until the 1970s. More than 60,000 Americans were forcibly sterilized, disproportionately Black, poor, and institutionalized (Lombardo, Three Generations, No Imbeciles: Eugenics, the Supreme Court, and Buck v. Bell, Johns Hopkins University Press, 2008). North Carolina’s eugenics board sterilized approximately 7,600 people between 1929 and 1974, with Black women targeted at five times their share of the population (Schoen, Choice & Coercion: Birth Control, Sterilization, and Abortion in Public Health and Welfare, University of North Carolina Press, 2005). The tool that justified these programs was the IQ test — the same basic architecture of pattern recognition, vocabulary, and abstract reasoning that is still used today.

The Cultural Bias Is Not a Theory. It Is Documented.

In 1972, a Black psychologist named Robert Williams did something that should have ended the debate about IQ test bias permanently. He created the Black Intelligence Test of Cultural Homogeneity — the BITCH-100. It was a 100-question multiple-choice test that measured intelligence using vocabulary, references, and problem-solving scenarios drawn from Black American culture (Williams, “The BITCH-100: A Culture-Specific Test,” presented at the American Psychological Association Annual Convention, 1972).

White test-takers scored dramatically lower than Black test-takers.

The point was not that white people were less intelligent. The point was that every intelligence test measures familiarity with the culture of the test-maker. When the culture shifts, the scores shift. The test does not measure a fixed property of the brain. It measures how well the test-taker’s cultural environment matches the cultural environment that produced the test. Williams proved this with the cleanest possible experiment: reverse the cultural frame, and you reverse the scores.

IQ Score Gaps Narrow When Cultural Bias Is Controlled

Standard IQ Test0pt gap
Culture-Fair Tests07 pt gap
BITCH-100Gap reversed

Williams, 1972; NAS, 1996; Nisbett, 2009

In 1995, Claude Steele and Joshua Aronson at Stanford University documented a phenomenon they called stereotype threat. In controlled experiments, they gave the same difficult verbal test to Black and white Stanford undergraduates. When the test was described as a measure of intellectual ability, Black students scored significantly lower than white students. When the identical test was described as a “laboratory problem-solving task” — with no mention of ability or intelligence — the gap disappeared (Steele & Aronson, “Stereotype Threat and the Intellectual Test Performance of African Americans,” Journal of Personality and Social Psychology, Vol. 69, No. 5, 1995). The same students. The same questions. The only variable was whether the test activated the cultural anxiety of being judged by a measure that has been used for a century to declare Black people inferior.

That is not a measurement of intelligence. That is a measurement of trauma.

The Flynn Effect, documented by James Flynn at the University of Otago, New Zealand, proved something equally damaging to the idea that IQ tests measure fixed intelligence: scores have been rising approximately 3 points per decade in every country that administers standardized IQ tests (Flynn, “Massive IQ Gains in 14 Nations: What IQ Tests Really Measure,” Psychological Bulletin, Vol. 101, No. 2, 1987). If IQ measured innate, hereditary intelligence, this would require that every generation be genetically smarter than the one before it — an absurdity. What the Flynn Effect actually measures is improving nutrition, increasing education, greater familiarity with abstract problem-solving, and broader exposure to the types of thinking that IQ tests reward. Environment changes the score. The test measures environment.

The National Academy of Sciences, in its comprehensive 1996 review of intelligence testing, concluded that “there is certainly no support for a genetic interpretation” of the Black-white IQ gap and that “existing genetic hypotheses are not well supported” (Neisser et al., “Intelligence: Knowns and Unknowns,” American Psychologist, Vol. 51, No. 2, 1996). Richard Nisbett, a cognitive psychologist at the University of Michigan, reviewed thirty years of evidence and concluded that the IQ gap between Black and white Americans is entirely environmental in origin and has been steadily closing as environmental conditions converge (Nisbett, Intelligence and How to Get It: Why Schools and Cultures Count, W.W. Norton, 2009).

What IQ Tests Actually Miss

The standard IQ test measures a narrow slice of cognitive function: pattern recognition, working memory, processing speed, and verbal comprehension. It does this well. The problem is not that the measurement is bad within its domain. The problem is that the domain is presented as the entirety of human intelligence, when it is a fraction of it.

In 1983, Howard Gardner at the Harvard Graduate School of Education proposed the theory of multiple intelligences, identifying at least seven distinct cognitive domains: linguistic, logical-mathematical, spatial, musical, bodily-kinesthetic, interpersonal, and intrapersonal (Gardner, Frames of Mind: The Theory of Multiple Intelligences, Basic Books, 1983). Standard IQ tests measure, at most, two of these — linguistic and logical-mathematical. The remaining five are invisible to the test. A person with extraordinary spatial intelligence, interpersonal acuity, or kinesthetic mastery registers as cognitively unremarkable on a standard IQ assessment.

Robert Sternberg at Yale University developed the triarchic theory of intelligence, which distinguishes between analytical intelligence (what IQ tests measure), creative intelligence (the ability to generate novel solutions), and practical intelligence (the ability to navigate real-world problems) (Sternberg, Beyond IQ: A Triarchic Theory of Human Intelligence, Cambridge University Press, 1985). Sternberg’s research demonstrated that practical intelligence — the ability to read people, adapt to environments, and solve problems that have no textbook answer — is a better predictor of real-world success than analytical IQ in most occupational settings.

After a threshold of approximately 120, IQ scores show diminishing correlation with real-world achievement. Above that line, success is predicted by practical intelligence, emotional regulation, and domain-specific expertise — none of which standard tests measure.

Sternberg, 1985; Goleman, 1995; Gladwell, “Outliers,” 2008

Daniel Goleman’s research on emotional intelligence demonstrated that EQ — the ability to manage one’s own emotions, read the emotions of others, and navigate social complexity — accounts for a larger share of variance in leadership effectiveness, job performance, and life satisfaction than IQ (Goleman, Emotional Intelligence: Why It Can Matter More Than IQ, Bantam Books, 1995). The research is clear: after a threshold IQ of approximately 115–120, additional IQ points add negligible predictive value for career success, income, or life outcomes (Gladwell, Outliers: The Story of Success, Little, Brown, 2008). What matters above that threshold is creativity, perseverance, social intelligence, and the ability to apply knowledge in unstructured situations — precisely the cognitive dimensions that standard IQ tests do not touch.

A test that measures two of seven cognitive domains, that was standardized on one cultural group, that produces scores influenced by stereotype threat, and that loses predictive power above a moderate threshold is not a comprehensive measure of human intelligence. It is a partial measure of academic aptitude wearing the costume of scientific objectivity.

The Economic Weapon

If IQ tests were merely bad science confined to academic journals, the damage would be limited to bruised egos. They are not. They are economic weapons that determine who gets hired, who gets promoted, who gets into college, and who gets tracked into remedial programs that function as holding cells.

For decades, American employers used IQ tests and IQ-adjacent aptitude tests as screening tools for hiring and promotion. The impact on Black workers was devastating and measurable. In 1971, the Supreme Court ruled in Griggs v. Duke Power Co. that employment tests with disparate racial impact were illegal unless the employer could demonstrate that the test measured skills directly related to job performance (Griggs v. Duke Power Co., 401 U.S. 424, 1971). Duke Power had required employees to pass an IQ test to transfer out of the lowest-paying department — the labor department, where nearly all workers were Black. The test had no demonstrated relationship to the ability to perform the higher-paying jobs. It existed to maintain the racial hierarchy under the camouflage of objective measurement.

The Equal Employment Opportunity Commission’s Uniform Guidelines on Employee Selection Procedures, adopted in 1978, established the “four-fifths rule”: if a selection procedure results in a hiring rate for a protected group that is less than four-fifths (80 percent) of the rate for the group with the highest rate, the procedure has adverse impact and must be justified by evidence of job-relatedness (EEOC, Uniform Guidelines on Employee Selection Procedures, 29 C.F.R. Part 1607, 1978). Standard IQ tests and their derivatives routinely fail this test. The adverse impact is structural, reproducible, and well-documented.

Where IQ-Style Tests Control Access

College (SAT/ACT)Gatekeeping
Military (ASVAB)Placement
EmploymentScreening
K–12 TrackingSorting

EEOC, 1978; NAS, 1996; College Board data

In education, IQ and IQ-proxy tests have been used since the 1920s for “ability grouping” and “tracking” — sorting students into different educational pathways based on test scores. Black students are consistently overrepresented in lower tracks and underrepresented in gifted programs (Ford, Reversing Underachievement Among Gifted Black Students, Prufrock Press, 2011). The tracking decision, often made in elementary school based on a single test score, determines the trajectory of a child’s entire educational career: the rigor of the curriculum, the quality of the instruction, the expectations of the teachers, and ultimately the colleges and careers that are available. A biased test administered to a six-year-old becomes a life sentence.

The Armed Services Vocational Aptitude Battery — the ASVAB — determines military job placement. It is the direct descendant of the Army Alpha and Army Beta tests administered during World War I — the same tests Carl Brigham used to argue for Nordic racial superiority. The racial score gaps on the ASVAB mirror those of civilian IQ tests, and they channel Black service members disproportionately into combat and support roles rather than technical and intelligence specialties that translate to high-paying civilian careers (National Research Council, Fairness in Employment Testing, National Academies Press, 1989).

What a Fair Assessment Would Look Like

The question is not whether cognitive ability matters. It does. The question is whether the instruments measuring it are honest. After a century of evidence, the answer is that they are not. A fair cognitive assessment would need to meet five criteria that standard IQ tests fail.

Free from the Publisher

Still Reading? Your Mind Is Already Working.

Readers who finish long-form articles like this score 18% higher on cognitive assessments. See where you rank. 10 questions. 3 minutes. No payment required.

Start Free Assessment →

23,847 people have taken this assessment

First: Culture-fair design. Every question, every scenario, every stimulus must be scrubbed of assumptions about the test-taker’s educational background, socioeconomic environment, and cultural frame. A vocabulary question that uses words more common in affluent white households than in working-class Black households is not measuring intelligence. It is measuring proximity to affluence. Raymond Cattell distinguished between “fluid intelligence” — the ability to reason and solve novel problems — and “crystallized intelligence” — accumulated knowledge from education and experience (Cattell, Abilities: Their Structure, Growth, and Action, Houghton Mifflin, 1971). A fair test must emphasize fluid intelligence and minimize the crystallized knowledge that reflects unequal access to educational resources.

Second: Multiple cognitive domains measured independently. A single IQ score is a blunt instrument. It collapses fundamentally different cognitive abilities into one number that obscures more than it reveals. A fair assessment profiles the test-taker across distinct cognitive regions — logical reasoning, spatial processing, memory, pattern recognition, verbal reasoning, and practical problem-solving — and reports each independently. A person who excels at spatial reasoning but scores average on verbal recall deserves to see both scores, not a misleading average that represents neither strength accurately.

Third: Real-world scenarios. Standard IQ tests present abstract problems — complete the pattern, rotate the shape, define the word — that bear little resemblance to the cognitive demands of actual life. A fair assessment uses scenarios drawn from the real world: navigating a budget shortfall, evaluating competing claims, solving a logistical problem with incomplete information. These tasks measure practical intelligence — the cognitive skill that actually predicts success outside the testing room.

Fourth: No time pressure that advantages test-taking familiarity. Speed of response is not intelligence. It is a measure of how many timed tests a person has taken. Test-taking speed rewards those with the most practice in test-taking environments — a variable that correlates with income and educational access, not cognitive ability.

Fifth: Accessible administration. A fair test cannot cost $200 and require a licensed psychologist to administer. If cognitive assessment is valuable — and it is — it must be available to the people most damaged by its historical absence. The person who has never been told what their brain can actually do, because the only test available was designed to tell them what it could not, deserves access to a real answer.

A test designed by white psychologists, normed on white populations, validated in white institutions, and used to restrict Black access to education, employment, and dignity is not an intelligence test. It is an intelligence gatekeep.

Real World IQ: What We Built and Why

This is why we built Real World IQ.

The assessment was designed by Timothy E. Parker — a Guinness World Record holder, the creator of the world’s most widely syndicated puzzle features, and a cognitive assessment designer with four decades of experience constructing problems that test how people actually think. Parker’s work reaches 200 million readers weekly through syndicated puzzle columns, and that reach taught him something that academic psychologists rarely learn: what cognitively engages a diverse, mass audience has nothing to do with academic vocabulary or abstract pattern matrices. It has to do with how the brain processes real situations.

Real World IQ measures six distinct cognitive regions:

Each domain is measured independently and reported as a separate score. There is no single number that claims to summarize the totality of a person’s cognitive ability, because no honest instrument would make that claim.

The questions use real-world scenarios, not academic abstractions. They do not assume a particular educational background. They do not reward vocabulary that correlates with household income. They do not penalize the test-taker for processing carefully rather than quickly. They measure how a person’s brain actually functions — its genuine strengths and genuine areas for development — without the cultural camouflage that has contaminated intelligence testing for 120 years.

The Strongest Counterargument — and Why the Data Defeats It

“IQ tests have been reformed. Modern tests like the WAIS-IV and Stanford-Binet 5 have addressed cultural bias. The criticisms are outdated.”

Three facts. First: The Black-white score gap on the WAIS-IV remains approximately one standard deviation — roughly 15 points — the same magnitude documented fifty years ago (Wechsler, WAIS-IV Technical and Interpretive Manual, Pearson, 2008). If the bias were removed, the gap would close. It has not. Second: Stereotype threat, which Steele and Aronson demonstrated reduces Black test scores by the mere framing of the test as an intelligence measure, operates on every test that calls itself an IQ test, regardless of question content (Steele & Aronson, 1995). The brand itself is contaminated. Third: The norming populations, while more diverse than Terman’s 1916 sample, still embed the assumption that the average white score defines the center of the distribution. The architecture has been redecorated. It has not been rebuilt.

What You Deserve to Know About Your Own Brain

Every person who has been tested, tracked, sorted, or excluded by a standard IQ test has been measured by an instrument that was not designed for them, not normed on them, and not honest about what it was actually measuring. That includes every Black American who ever sat for the SAT, the ASVAB, a school placement test, or an employment aptitude screen.

The standard test told you a number. It did not tell you what your brain can do. It did not map your cognitive strengths. It did not reveal that you might have exceptional spatial reasoning paired with average verbal recall, or extraordinary practical intelligence paired with moderate pattern recognition. It gave you a single number on a single scale designed by people who believed your race determined your score before you picked up the pencil.

That number was never your intelligence. It was their assumption about your intelligence, dressed in the language of science.

Five Things That Would Change If Intelligence Were Measured Fairly

1. Educational tracking would be replaced by cognitive profiling. Instead of sorting children into “gifted” and “remedial” based on a single score, schools would identify each child’s specific cognitive strengths and design instruction to leverage them. A child with extraordinary spatial intelligence would not be labeled “below average” because a verbal-heavy test missed the best thing about their mind.

2. Employment screening would match cognitive profiles to job demands. Instead of a single aptitude cutoff that produces adverse impact, employers would match the specific cognitive requirements of a role to the specific cognitive strengths of the applicant. The EEOC’s four-fifths rule would be met not by lowering standards but by measuring the right things.

3. Military placement would unlock talent instead of burying it. The ASVAB would be replaced by an assessment that identifies the cognitive profile best suited to intelligence analysis, technical specialties, leadership, and strategic planning — without the cultural bias that currently channels Black service members away from the roles with the highest civilian earning potential.

4. College admissions would assess potential, not privilege. The SAT measures how much test preparation a family can afford. A culture-fair cognitive profile would measure what the student’s brain can actually do, regardless of how many Kaplan courses their parents purchased.

5. Every person would have a map of their own mind. Not a ranking. Not a number. A map — showing where they are strong, where they can improve, and what kinds of problems their brain is built to solve. That information belongs to the individual, not to the institution.

The Bottom Line

The numbers tell a story that cannot be argued away:

Alfred Binet designed a diagnostic tool to help children. American psychologists turned it into a sorting mechanism to rank races. That sorting mechanism was used to sterilize Black women, exclude Black workers, track Black children into dead-end classrooms, and construct a scientific veneer for the oldest lie in American history: that Black people are less intelligent.

The lie was never in the people being tested. It was in the test.

A fair cognitive assessment — one that measures how the brain actually works across multiple domains, using real-world scenarios, without cultural contamination — does not just produce different scores. It produces different futures. It tells a person what their mind is built for instead of what a century-old prejudice says it is not.

You have spent your entire life being measured by instruments that were not designed for you. You deserve one that was.