HOMOSEX NOT HEREDITARY

Two recently published "scientific" reports claimed that homosexuality is hereditary. The press has taken this conclusion as proof that the homosexual person has no control over his or her sexual preference. This is then touted as a reason homosexuality should be a protected civil right. But a detailed analysis of what was actually done to make these reports shows that this conclusion is not only unwarranted, but is in fact proven false by their own data.

The studies, showing the mistakes made by the researchers

STUDY 1

Note: A link to the website of this study used to be here, but the site no longer exists.

According to the first published report, the scientists tested 15 subjects known to be homosexual for the presence of a certain gene. 4 of the 15 individuals tested possessed the gene. They reported that a test using Student's t distribution showed that 26 percent is a statistically significant value, and concluded that homosexuality is hereditary.

Now, let's look at their methods and mistakes:

No control subjects were included in the experiments. All of the subjects studied had the wanted trait (homosexuality). No control group was present. Therefore, there was no independent variable in the experiment -- nothing to compare the result to. You can't even prove a correlation, let alone a causal relationship, without a control group to compare findings with.
The experiment was designed wrong. The gene should have been the independent variable, and homosexuality should have been the dependent variable. Both values should have been collected over a large number of randomly selected subjects.
The collected data were categorical data, not numerical data. But they used methods for numerical data.
The sample size was too small. A good study needs at least 30 subjects, either randomly picked, or 15 in the experimental group and 15 in the control group. A larger number of subjects increases the significance of the findings. Using categorical data requires even larger sample sizes. And a larger sample size is needed if the value tested for is relatively rare.
They got the math wrong. 4/15 is .2666 ... which rounds to 27 percent, not 26 percent.
They used Student's t distribution, which is used to compare two experimental values (numeric), or to compare a value to a norm. But they had neither two experimental values, nor did they disclose a norm to compare their one experimental count to. This test is designed to be used when the data are multiple values of some real-number (numeric) variable.
But the data at hand were discrete categorical values (Boolean type yes-no values). The correct method is to create a crosstable, calculate the Chi-squared value, and then perform a Chi-Squared test to determine whether or not there is an association. But they didn't collect the data needed to do that.
They did not reveal whether they tested for the presence of the DNA itself, or for a protein that is produced only if the gene is active.
They released the results as though they had shown a causal relationship, without any proof of a causal connection, or even an association. Since the data were categorical, a correlation is impossible.

STUDY 2

Note: A link to the website of this study used to be here, but the site no longer exists.

According to the second published report, the scientists tested 40 subjects known to be homosexual for the presence of a certain gene. 33 of the 40 individuals tested possessed the gene. They reported that a test for correlation (to what???) showed that 64 percent is a statistically significant value, and concluded that homosexuality is hereditary.

Now, let's look at the methods and mistakes in this second study:

The method used to obtain subjects is highly suspect. They advertised for volunteers in homosexual-interest magazines.
No control subjects were included in the experiments again. They don't learn! All of the subjects studied had the wanted trait (homosexuality). No control group was present. Therefore, there was no independent variable in the experiment -- again, nothing to compare the result to. You can't even prove a correlation, let alone a causal relationship, without a control group to compare findings with.
The experiment was again designed wrong. Again, the gene should have been the independent variable, and homosexuality should have been the dependent variable. Both values should have been collected randomly from the general population.
The collected data were categorical data, not numerical data. But they used methods for numerical data.
The sample size was still too small for any great level of significance.
They also got the math wrong. Where in the world did they get 64 percent? 33/40 is .825, which rounds to 83 percent, not 64 percent. If they took the distance of .825 from .5, they would have gotten 65 percent, not 64 percent. Another quandary!
They used a correlation test. But who knows WHAT they correlated the data to? Again, they had neither two experimental numeric values, nor did they disclose a norm to compare their one experimental count to. This test is designed to be used when the data are multiple values of some real-number (numeric) variable.
But the data at hand were discrete categorical values (Boolean type yes-no values). The correct method is to create a crosstable, calculate the Chi-squared value, and then perform a Chi-Squared test to determine whether or not there is an association. But they didn't collect the data needed to do that.
They did reveal that they tested for the presence of the DNA itself. This means that the distribution is the same as that for a dominant gene. But their statistical test didn't seem to take this into account.
They released the results as though they had shown a causal relationship, without any proof of a causal connection, or even an association. Since the data were categorical, a correlation is impossible.
An attempt to duplicate the second study was unable to reproduce the results.

Analysis of the genetics behind the studies:

Let us see if we can figure out what norms could have been used in either of these studies:

Now turn it around the way it is supposed to be, and see what happens when the independent variable is the gene, and the dependent variable is the trait:

A randomly distributed dominant gene should appear in 75 percent of the population.

A randomly distributed recessive gene should appear in 25 percent of the population.

A randomly distributed gene should appear in 75 percent of a DNA probe assay of the population.

Any other numbers observed indicate that either the gene is NOT randomly distributed in the population, or that the sampling method is flawed.

Note that a control group is needed to be able to tell several of the cases apart.

Note that, without a control group, numbers not near 100% tend to disprove any causal claim. There are cases with the effect, but not the cause.

Note that, with a control group, numbers not near 100% disprove any causal claim. There are cases with the effect, but not the cause.

Applying genetics to the published conclusions:

Experimental group (has trait) %	Control group (no trait) %	Correlation trait to gene	Type of relationship
100	100	0	Ubiquitous uncorrelated gene
100	69	.5	Randomly distributed dominant gene with another random recessive gene
100	60	.5	Randomly distributed dominant gene with another random sex-linked gene
100	43	.5	Randomly distributed dominant gene with another random dominant gene
100	20	.5	Randomly distributed recessive gene with another random recessive gene
100	14	.5	Randomly distributed recessive gene with another random sex-linked gene
100	08	.5	Randomly distributed recessive gene with another random dominant gene
100	0	1	Correlated gene
75	75	0	Dominant uncorrelated randomly distributed gene
75	75	0	DNA probed uncorrelated randomly distributed gene
50	50	0	Sex-linked uncorrelated randomly distributed gene
50	50	0	Environmental factor
25	25	0	Recessive uncorrelated randomly distributed gene
0	100	-1	Correlated preventative gene
0	0	0	Gene not in population
x	x	0	Environmental factor
x	100-x	0	Nonrandom distribution of uncorrelated gene
x	100-x	0	Nonrandom sample collection

Experimental group (has gene) %	Control group (no gene) %	Correlation gene to trait	Type of relationship
100	100	0	Ubiquitous uncorrelated trait
100	0	1	Correlated trait
75	75	0	Uncorrelated trait
75	75	0	Wrong gene (dominant)
75	0	.5	Gene requires another dominant gene to work
50	50	0	Uncorrelated trait
50	50	0	Random environmental factor
50	50	0	Wrong gene (sex linked)
50	0	.5	Gene requires another sex-linked gene to work
25	0	.5	Gene requires another recessive gene to work
25	25	0	Uncorrelated trait
25	25	0	Wrong gene (recessive)
0	100	-1	Negatively correlated trait
0	0	0	Trait not in population
x	x	0	Environmental factor
x	100-x	0	Nonrandom sample collection or gene partially correlated

STUDY 1

Now let's see which norms make sense from the conclusion the first study group obtained:

If they used the actual prevalence of this DNA in the population, they would have provided that figure.
If they had assumed the gene was randomly distributed, their conclusion would have been that the gene does not cause homosexuality.
They might have assumed that the prevalence of the gene equals the prevalence of homosexuality. If so, the study is invalid (begging the question).
They might have assumed 0 for the prevalence of the gene in heterosexuals.
They might have assumed 0 for the prevalence of the trait expression in heterosexuals.
If they had assumed a randomly distributed dominant gene, their conclusion would have been that the gene does not cause homosexuality.
They might have assumed a randomly distributed recessive gene.

Now let's ask: Is the 26 percent a significant difference?

If the norm assumed was zero (the trait is not expressed in heterosexuals), then 26 percent is a value that tends to disprove the assertion, because well over half of the subjects expressed the trait without the gene.
If the norm assumed was the 25 percent recessive expression, then the value obtained was the possible value from a group of 15 that was closest to 25 percent. That means the gene is most likely a recessive gene that is randomly distributed among the homosexual population, and thus has no correlation to homosexuality.
If they did their t test on the difference between .26666... and .25, they were measuring the effects of the small sample size they used, not an actual hereditary effect. No value closer to .25 could possibly be obtained from their small sample.

In fact, when using any of these scenarios, the result disproves the assertion that the gene causes homosexuality, because the value obtained is either closest to random chance, or shows that the gene is more correlated with heterosexuality.

Note the fact that they tried to convert categorical data into the numeric values used. They used Student's t distribution, which is used to compare two numeric values, or to compare a numeric value to a norm. But they had neither two experimental values, nor did they disclose a norm to compare their one experimental count to. This test is designed to be used when the data are multiple numeric values.

But the data at hand were categorical yes-no values. The correct method is:

Create a crosstable.
Calculate the Chi-squared value.
Perform a Chi-Squared test to determine whether or not there is an association.
Cramer's coefficient can be used instead of the Chi-squared test, but it is not as accurate.

But they didn't collect the data for that.

STUDY 2

Now let's see which norms make sense from the conclusion the second study group obtained:

If they used the actual prevalence of this DNA in the population, they would have provided that figure.
If they had assumed the gene probed was randomly distributed, their conclusion would have been that there was not enough evidence that the gene causes homosexuality.
They might have assumed that the prevalence of the gene equals the prevalence of homosexuality. If so, the study is invalid (begging the question).
They might have assumed 0 for the prevalence of the gene in heterosexuals. If so, the conclusion was invalid because they did not test for this case.
They might have assumed 0 for the prevalence of the trait expression in heterosexuals. The conclusion is invalid for the same reason.
If they had assumed a randomly distributed sex-linked gene, their conclusion would have been that the gene causes homosexuality. But they offered no control evidence that the gene was not just as prevalent in heterosexuals. And the gene was not on a sex chromosome.
A randomly distributed recessive gene makes no sense with a DNA probe.

Now let's ask: Is the 82.5 percent a significant difference?

If the norm assumed was zero (the trait is not expressed in heterosexuals), then 82.5 percent is a value that tends to prove the assertion, but ONLY if they had collected a control group that produced a value near zero. Without the control group, the gene could be just an uncorrelated gene.
If the norm assumed was the 75 percent expected probed expression, then the value obtained was the possible value from a group of 40 was close enough to 75 percent to be observational error. The observed value turns out to be a difference of 3 samples, well below one standard deviation (.144) away from the expected value (.75) of a randomly correlated gene. Thus, the result is not significantly different from the norm. This is attributable to the way the sample was collected and the low number of subjects. That means the gene is most likely a gene that is randomly distributed among the homosexual population, and thus has no correlation (or a very weak correlation) to homosexuality. A control group would tell whether the gene was acting with another gene, or was uncorrelated. But they did not collect one.

In fact, using any scenario here with the second study, the result can not prove the assertion that the gene causes homosexuality without a control group to compare it to.

Note the fact that they tried to convert categorical data into the numeric values used. They used a correlation test, which is used to compare two sets of numeric values. But they did not have sets of experimental values. This test is designed to be used when the data are multiple numeric values.

But the data at hand were categorical yes-no values. The correct method is:

Create a crosstable.
Calculate the Chi-squared value.
Perform a Chi-Squared test to determine whether or not there is an association.
Cramer's coefficient can be used instead of the Chi-squared test, but it is not as accurate.

But they didn't collect the data for that.

Notice also that you can NOT select on the dependent variable (sexual preference) and expect to see the independent variable (the gene) vary in the exact manner of a causal effect. Such an experiment is designed backwards, and cannot possibly be used to prove causality. The independent variable must be actively varied and the dependent variable observed, in order to show any cause-and-effect relationship. But genes cannot be actively varied.

Notice that you can NOT use statistics intended for numeric data (each sample produces a number) when you have categorical data (yes-no or other non-numeric selectors). Statistics intended for categorical data must be used.

MISTAKES

Excuses given for using the wrong procedures in these studies:

Final conclusions:

Most of these studies were obviously designed to reach a predetermined conclusion, regardless of the actual facts. The "scientists" involved set out to prove their political beliefs, rather than to find out the truth. Otherwise, they would have used the proper scientific methods and collected a truly random sample.

As they stand, these studies have absolutely no scientific value. Instead, the people who did these studies probably fit at least one of these cases:

Apparently they had political science books on how to prove social or environmental issues, and tried to use those without any expertise.

"Studies" of this kind are meant to promote a political dogma, not to scientifically prove anything.

Numbers not near 100% tend to disprove any causal claim. There are many individual cases in these studies having the effect, but not the proffered cause. They disproved their own claims with these cases.

Beware of ANY study done by a group that would benefit from one outcome of the study. Science must be done by disinterested scientists (scientists who do not want a particular outcome to be true.)

Epilog: The Civil Rights Entanglement of Religion and Homosexuality

Those who perpetrated these instances of bad science have as their goal a law prohibiting discrimination against homosexuality. But such a law will never exist for very long, because any such law is discrimination against religion. Discrimination against religious belief is unconstitutional in the United States of America.

There is only one solution to this dilemma: Since religions can discriminate against each other, the only way for homosexuality to be protected is for it to become a religious belief. The Political Correctness religion would do for this purpose.

HOMOSEXUALITY IS NOT HEREDITARY

The studies, showing the mistakes made by the researchers

STUDY 1

STUDY 2

Analysis of the genetics behind the studies:

The laws of sexual genetics as applied to expression of a gene:

Applying genetics to the published conclusions:

STUDY 1

STUDY 2

MISTAKES

Excuses given for using the wrong procedures in these studies:

Final conclusions:

Epilog: The Civil Rights Entanglement of Religion and Homosexuality