At one point, IQ was calculated based on the mental age/ chronological age, * 100. There are a few issues with this practice but the one that has stuck with me the most is that the concept of "mental age" is misleading. If an adult is intellectually disabled, and has an IQ of 60, he still has very little in common with a 7-year-old. He's probably got many weak areas in his reasoning ability and memory but, he'll have a personality and experiences that are very distinct. Thinking of him as a little kid's mind in a man's body doesn't make sense.
Now, IQ is calculated by way of comparing someone's performance with others their own age. It's more meaningful to say "you did better than 90% of your peers" than to say to a 7-year-old "you performed as if you were 15" (trust me, it's totally different chatting with a smart little kid than talking to an average teen).
Here's how scores are calculated. I'll use a real test as an example in case you want to look into it a little more. The most popular test for kids is The Wechsler Intelligence Scale for Children, Fifth Edition (WISC-V). When a kid takes one of the subtests like Block Design, (on which they assemble multicolored blocks to match a picture) I record the number of questions they got right. This is called their "raw score." This doesn't tell us much. A 6-year-old matched the blocks five times... is that good? Also if he gets five questions right on that test, but then eight right on a different one, does that mean he's better at that other test, or is that just what you'd expect him to get anyway?
The most important part of test development (and one that you can be sure online "IQ" tests don't do) is to give the test to thousands of people from around the country. This gives us "Norms" to compare performance to. Maybe most kids at the age of six get five questions right on Block Design. The variation in their scores is also calculated, so that if most get five questions, but a lot also get one or two more, then the extra one or two are not considered a big deal and won't pump up their score very much, whereas if it's rare to get an extra one or two right, then the kids who do will see their score go up a lot. The score created this way is called a "Scaled Score," because it is calculated to give an impression of how the child did in comparison to peers. Since Scaled Scores are always expressed as having a mean of 10 and a standard deviation of 3, it can allow us to compare different tests. If our hypothetical kid did get five questions right and so did most other kids his age, his scaled score would be a 10. If his score is a 13 on a different test, then we can confidently say he did significantly better, and a 7 would show us he did significantly worse.
The next step is to add certain scaled scores together to form a "Composite." Closely correlated subtests are grouped together so we can get an impression of the student's performance in a given area. For example, Block Design is grouped with visual puzzles, and both tests require problem solving based on pictures and not words. We'd add the Scaled Scores for both together, and the total is converted again to a "Standard Score." These are like Scaled Scores, but they have a mean of 100 and a standard deviation of 15. They also give a better impression of how a student performed than just averaging scores since they account for how common or uncommon it would be to get a certain set of Scaled Scores. If a student gets some variation but he's mostly average, then he'll get a Scaled Score close to 100. On the other hand, if he gets all significantly higher than average scores (maybe all 13s), then his Standard Score will be higher than one Standard Deviation above average (even though all of his scaled scores were one standard deviation above the mean). This is a little tricky to explain, so stick with me.
Students don't usually get any 13s (it is a significantly above average performance after all), and even fewer will get three of them, so instead of a 115, it might be a 120 or a 125 since this total is just that far above average. These are also more reliable than the scores for any single subtest, so we consider them a bigger deal. The IQ, is actually a Composite of all of the subtests on an IQ test. Put aside all of the complicated math, and what an IQ score says is "Compared to lots of other people the same age, how do you do at a wide variety of tasks?"