Sunday, 24 March 2013

Swimming vs running trends for four levels of sport

NB: This post is a redux of a previous discussion on the same topic. The difference this time is more reasoning behind the stats and me asking the opinions of other people. Therefore the updated results address suggestions and challenges from a few people, including Alex Hutchinson himself. I feel the questions if not satisfactorily answered are at least out in the open. The good news is that my "discovery" appears to have survived the most obvious challenges and the conclusions should now be more robust.

I wanted to find out what sort of trends, if any, appeared when taking the ratios between the top (i.e. fastest) men's and top women's ratio of times in two sports: swimming and running. I define a ratio as the following

Ratio = [Time for a woman to complete distance X]/[Time for a man to complete distance X]

For example, if considering the 10,000 m track event and a woman's time is 32 minutes while a man's is 29 flat, their time ratio is 32:00/29:00 = 1.10. Let's then take another fictitious ratio of 1500m times as 4:20(female)/3:59(male) = 1.08.

Why go to the trouble taking these ratios? To skip ahead, what I've found is that as distance X increases, this ratio goes up for running and down for swimming. Using the same two examples as above, the fictitious ratio goes up from 1.08 to 1.10 as I increase X from 1.5 km to 10km. Here my argument holds because I invented the numbers. All numbers from this point onward will be quite real. Below is an illustration of what I will soon show with actual data:
Simple enough to do these calculations, however it turned out to be more difficult arguing there was any inherent meaning the final results. Considering the above example yet again, I what have I actually shown? Another faster male might run the 1500m in 3:50, so the new ratio becomes 1.13 and my argument now falls to pieces. Or the female times are artificially slow due to low participation. Clearly I am going to need a lot of evidence to support such a sweeping generalization as "women improve with respect to men in swimming but worsen for running". From a (admittedly cursory) check with the available literature and a couple of correspondences, I have not seen these two trends discussed together. Even if the swimming/running down/up trend is well-known, it was a good exercise working with different lines of reasoning. And I imagine at least some of the data here could somehow be novel.

I mentioned at the start that Alex made some comments on the preliminary results. Along with some other possible issues and possible biases, I'll try to address the most pressing ones after I show the key findings.

I have analyzed for four levels of sport for these up/down trends: top canadians, top americans, Olympic level athletes, and "top 5 world". I define "Top Canadian" as being the time required to qualify for the Canadian Inter-university sport (CIS) championships, which is based on a rolling average of the best running or swimming times from recent years. Basically it means if you run that time you are one of the six or so fastest Canadians attending a Canadian university. "Top American" means the same thing except for the more competitive NCAA Division 1 level. Olympic Class A here means the time needed for an almost-guaranteed Olympic spot. Finally "Top 5 World" means the time of the 5th fastest person (not 5th fastest time, mind you) in the world. I chose this measure instead of the world record time to diffuse arguments such as "Florence Griffith's time doesn't count because she ran on drugs and the wind gauge was broken!" If the fastest person's time is suspicious we are talking cheating; if the 5th fastest person also has an edge we are talking conspiracy a la Tour de France. Furthermore, to question the validity of these ratios because of drug use also requires the additional assumption that only the men or the women are cheating and not both, or else there would be some degree of cancellation in the ratios. For ultra distances I had to tweak things slightly, i.e. the marathon (10km) swim uses the 5th fastest Olympic time in London and Beijing (the two data points are both 1.07). For the ultra distances (i.e. the English Channel swims, Two OceansComrades, Leadville 100 and Badwater ultramarathons) I did not have access to top lists; instead I used the course record times. That the trend so often holds for many terrains is also rather surprising.

My sources of stats:

Swimming: CIS long course (free), NCAA 'A' standardOlympicTop 5 world timeEnglish Channel

Running: CISNCAAOlympic ATop 5 world time

For all cases for both running and swimming CIS times are slower than NCAA D1 cutoffs, which are in turn slower than Olympic A qualifying times, and it follows "World top 5" is the fastest of all.

CIS time > NCAA time > Olympic time > World time

Keep in mind this (quite expected) trend above does no hint at what I am about to show. And to reiterate, by including four levels of sport it is more likely I can identify fluke trends versus more fundamental ones. It is my best attempt at increasing the 'sampling size' without actual data manipulation.

Without further ado, here are the ratios for running:

And here they are for swimming:


You can access the raw data here. For some of the shorter running "top 5 world" times I used a mix of indoor and outdoors' lists. All times are outdoors, of course, for any distance above 5km for running or swimming, but I deliberately used both for shorter runs as these might reveal yet another bias. Overall I did not see many anomalies except for the jump at 1500m women's swim, the down course of the Comrades and the marathon. The 1500m swim "bump" is unsurprising as women do not typically compete at 1500 specifically (there is no women's 1500m swim at the olympics). The marathon jump is perhaps the most surprising since it's a deep-seeded deviation (i.e. the 5th fastest person-times were used). Perhaps more could be said about the men's marathon under-performing or the women's doing better than these trends expect. The Comrades "down" time is the biggest deviation. I don't know enough about the 90km Comrades course, but the women's record was set in 1989 by Frith van der Merwe, while the other three records were set much more recently. Since Badwater, Leadville and Comrades "up" do show yet more upward movement in the ratio the net trend looks to be real. For swimming the overall downward trend is quite consistent, and the ratio holds rather constant at 1.06-1.07 for 5km & 10km swims and the 33600m English Channel crossing(s).

What I cannot explain is why these ratios behave the way they do. My "thesis" argument is that these graphs are a real gender trend, and actual physiological effects are being demonstrated here. Alex put forth this suggestion:
Body composition seems to be the obvious culprit. Women have less muscle and more fat. So for swimming, it seems logical that at shorter distances, where power is key, men have a bigger advantage; and at longer distances, added buoyancy becomes more significant and women narrow the gap. What’s harder to understand is why the trend goes in the opposite direction in running. 
There are lots of interesting reasons that could explain why these trends occur. The most obvious culprit is that body fat hinders running speed but helps in swimming, but such fat does not completely overcome the difference, as males are still faster overall. And why cannot a man simply add a few fat pounds to his frame to benefit from the same buoyancy? Would a targeted fat injection under the skin make men swim even faster?

My wife read some of the early material and asked me for the "so what?" justification for sharing this trend. As a data geek my reasons are self-fulfilling, however a possible use might be to consider these ratios when judging athletes. If a swimming coach notices the women on his/her team are almost as fast as the men (for longer distances), rather than berate the male crew instead view this as expected. Recall the last male finisher at the London 10k marathon swim was slower than the last place female. Or another example: notice how the running CIS ratios hover higher above the world-class ratios than for CIS swimming. One could argue that Canadians are losing all our best runners to the US, yes. But if the same trend is not seen to such a degree for CIS swimming, what else could be happening? Could it be that Canadian female runners and not female swimmers are being targeted  by American colleges? And why are NCAA ratios also a bit higher in running compared with swimming? Perhaps another explanation lies in the training schedule itself of university female runners. I can let readers draw their own conclusions or read further for more considerations.

That's all I will speak of with respect to physical interpretations. I will first need to defend against the other explanation that competitive levels are completely different within in each levels of sport, i.e. in taking these gender ratios we are talking apples and oranges. If you are already convinced the trend is real you might stop reading here. If not, I encourage you to consider the other arguments.

Still reading? Ok, so my job will now turn to defending these trends as real (though maybe not well explained), and more than just flukes based on participation rates. Here is what Alex Hutchinson had to say about that second hypothesis:
I’m not sure you can entirely rule out participation rates at different distances as being a factor, since the history of the two sports might be quit different. Women’s swimming was added to the Olympics (1912) well before women’s running (1928), and there are a number of famous examples of female long-distance swimmers from those years. Most famously, Gertrude Eberle broke the men’s record for swimming across the English Channel by more than two hours in 1926 (and was just the sixth person to do it overall), and another woman also broke the men’s record a few weeks later. This was a big, big deal at the time. Are there still lingering effects of differing perceptions about what women are good at? Seems unlikely, but hard to say for sure. Again, the strongest argument against this is the fact that the trends are evident even across the shortest (<1k) distances.
I completely agree that numbers can play a role. I also know that numbers are not necessarily correlated to competitive outcomes, but it's a good starting point. Let us begin with the lowest (though still quite high level) of CIS running and swimming. What if CIS numbers are invalid because the best athletes go to the US? A reader commented the following:
Canadian women get recruited to NCAA universities at a much higher rate than men because of Title IX in the US which guarantees equal scholarships for men and women. At schools that have football teams, that means that 80 - 100 full scholarships go to men for football. That opens up many more scholarships for women. Therefore many more top women than men get scholarships to NCAA schools. Many top Canadian women have done well there in the past 20 years. But there are not the same level of scholarships in T & F for men.
Populations (for reference purposes):

Total Canadian university enrolment: 1,112,370 (in 2008/09), of which 57.6% were women and 42.4% were men. Percentage of Canada's population currently enrolled in university in 2009: 3.3%

Total American university enrolment: 17,487,475 (2012?), of which 57.4% were women, 42.6% were men. Percentage of USA's population currently enrolled in university (in 2012?): 5.6%

Whether we are talking US or Canadian universities, the gender enrolment show proportionately the same ~57% bias for women (and about the same for sports within the university system). (Aside: Something that always surprises me is that despite being more expensive to attend, more Americans go to university than Canadians, but I've also heard that Americans measure what constitutes a college/university differently than what Canadian statistics do).

The reason I am showing the above stats is because one might argue many talented canadians are attending US universities in large numbers. I checked and here I cite this article which says "Nearly 44,000 Canadians [4% or so] undertake full-degree programs in other countries". This means that at least 96% of Canadians choose to stay in Canada for study. This doesn't mean the 44,000 or so are not actively winning races (or science fairs). Canadian Mohamed Ahmed is doing quite well south of the border. Both men and women are heading to american universities if they get better funding.

It is not only possible but certain the CIS system does not have the best Canadians competing. But consider that the best talent here is still "in" the Canadian athletics system. Both a male and female 1500m runner once recruited are developed by the same coaches and competing against others in a nearly closed system. (My four years at McGill, I didn't see many girls or guys disappear halfway through their studies once their racing form improved). This actually helps my case, for if a sub-elite and entirely different group of athletes exists in CIS then we have a properly isolated and separate set of trained talent.

Now onto the NCAA swimming and running. Obviously the United States has more of the top universities in than anyone else in the world, both for academics and athletics, so that the US keeps its best. Though the CIS talent is lost to americans, americans sports are a completely their own entity, except for recruiting talent from abroad. Kenyan runners do attend, yes, but are there any more men than women being picked up? 

Next question is how much funding does each sport receive? Are swimmers getting more money than runners? More importantly are female swimmers getting disproportionately more money than female runners? I could not find good numbers for money allocation per sport per se, however I did find here a scholarship allotment for NCAA sports per university, whereby however lucrative the scholarships you hand out, you can only dole out so many. Comparing cross country/track & field vs swimming/diving (for the division 1 level), running has a ratio of 18 women /12.6 men's scholarships for a ratio of 1.43. Swimming gets a relative contribution of 14/9.9 = 1.41. These ratios are nearly identical, which means women are not relatively more favoured in swimming compared with running while swimming gets slightly fewer total scholarships. That disfavour is not surprising as there are more running events than swimming, hence the 'talent' is slightly more divided.

Right now the United States is a powerhouse in men's swimming (a la Ryan Lochte and Michael Phelps) and I cannot see why the college level is any different. To argue men's swimming at the NCAA level is weaker than high-level females is a hard sell.

I wanted to see whether more women could be found in college-level swimming than running. Historical NCAA participation numbers are available here and I turned these into percentages for the years 1981, 1991, 2001 and 2011.
The number of women in college is growing and so is their participation in college sports. This trend is clearly true for cross country, track & field, and swimming, as shown above. There are relatively more female swimmers than men compared with either track and field or cross country running but all three are above 50% and rather close together.

Perhaps the larger percentage of female swimmers can explain why the ratio between women vs men's race times are higher in running than swimming: By adding more competitive women swimmers we expect the ratios of the best times to decrease. Lower total female running participation might explain why running ratios higher than swimming, true, but how can we explain why the trend changes with distance? We would then need more data on the distribution of women within running. Using the bar chart above, there appears to be relatively more long distance runners. I argue this given the higher percentage of women in XC running versus track and field. Cross country running races are longer than track distances. So if the red bar is higher than the green it would be difficult to argue most women "just want to sprint". Anecdotally I can say women tend not shy away from the cross country season; at McGill (i.e. in the CIS world) there were far more men than woman on the XC squad. Or perhaps the female distance athletes are exhausted by the XC season. But then why are men not equally tired during the indoor season?

Moving beyond the NCAA (i.e. world level), in terms of total participation, I don't know what global numbers looks like. But for the US swimming is slightly more popular among women than men in the United States, i.e. 45% male and 55% female (2011). Total participation has remained more or less constant for the last ten years. But similarly, more women than men participate in running, i.e. 45% male and 55% female (2011), though historically women ran in much smaller numbers. The history of women's running is shorter; women only began to participate in large numbers after the 1980s, about 10-20 years after the men's boom. Using for reference a case study I did last year, the Tely 10 race (in St John's Newfoundland) became dominantly female (>50% participation) in the mid 2000s. For many years women were discouraged from long distances; running anything more than 800 meters was considered "too dangerous". The Boston marathon banned women until 1972, and the Tely 10 race was effectively all-male until the 1980s. Meanwhile women have been swimming great distances since the 1920s (cf Alex's earlier comments). I am unable to explain what has been taking place on a meta level, but it seems likely the training information is now equally up to date on both sports (i.e. no back paddling is seen at the highest level). Therefore I do not see how such old legacies have an effect on present training. Can even a half-century lag for women's running be affecting young university athletes competing for lucrative US scholarships?

As for the Olympics, at London 45% of all athletes were female. In swimming and running there is a 1:1 correspondence of gendered events hence a near ideal 50/50 split in women's attendance. Given the honour of competing at the games whether you are a man or woman should mean a gender-independent desire to send athletes from relevant countries (while some negligible countries share a different view).

One last piece of evidence I can offer (i.e. that the gender trends are real) by using the Riegel formula. If you are not familiar with this equation, it can be used for running, cycling or swimming events to predict times for distances you have not yet done:

Time A/Time B = (Distance A/Distance B)1.06

Expressed in its original form

                                                               T  = kDm

where K and m are constants (and m = 1.06 in the previous equation while k cancels out).

I used the formula to interpolate CIS and Olympic 800m/1500m times for men and women, respectively, since only one is swum by each gender). Notice how these ratios severely break from the downward trend.
Explanation? The Riegel formula is gender neutral and so unable to predict more subtle trends within men and women as a coupled movement. Therefore another take-home message might be to consider adding a sex component to Riegel predictions for any serious endurance sports study.

A side note: If we we reject that the gender ratios are real and not just based on participation then we might also downgrade the Riegel formula as tool to predict low-participation levels at longer distances. I have been wanting to modify the Riegel formula anyway, as I have noticed the exponent "1.06" is not static and varies from person to person. In fact I have found that the values for k and m vary quite a lot: m is often greater than 1.06 (varies between 1.05 to 1.15) and prefactor k is just as revealing. The graph below is based on extrapolated (k,m) values for everyone from amateurs, in-betweeners like myself, and the elites like Bekele and Paula Radcliffe. What seems to happen is that as you improve your k and m values (increase and decrease each, respectively) you tend to move towards the bottom centre of the plot. This is entirely empirical and something I have yet to see discussed in literature. Notice how people can improve from two directions.


For reference here is Riegel's original publication. Sorry I cannot share the raw data as it involves people who give me their running stats with confidence. I did however divide the genders; unfortunately I do not see any revelations here so it is not going to help me solve the general mystery I started with.

Conclusions/final thoughts

This may be a false choice, but assume there are two causal explanations to pick for these trends, explanation A and B.

A) There is an inherent lack of participation of female (and only females) at multiple levels (CIS, NCAA, world) of competitive running. Simultaneously there is an overabundance of female swimmers at every level of the sport, in different countries and various skill levels.

B) An as-yet unconfirmed single physiological difference between men and women explains the two opposite trends in running and swimming.

Going by Occam's Razor and for my own intuition the second choice seems far more credible. But it still may not be true and participation could still be the dominant reason we are seeing these trends. I would very much like to hear other ideas if they are out there. I could also check out trends for other sports (i.e. cycling), but speaking of endurance events this has been a marathon afternoon of data mining and writing so I'll leave it at the present two for now. Cheers.