“Tell Me About Statistics!” vol.2 What is a confidence interval (CI)?

on 4月 20, 2023

A 95% confidence interval (CI) is often observed in medical studies. In this article, we explain the confidence interval (CI).

Does the extracted data represent the whole data?

The results of a given test will not necessarily be the same every time.

For example, let us assume that 100 adult men who underwent echocardiography at Health Examination Center A were randomly selected, and that the mean of their left ventricular mass was 130 g. This data cannot be used to determine that the left ventricular mass in Japanese adult males is 130 g. The results for 100 different people may be 135 g or 128 g.

Therefore, the statistical approach is to set a range within which the mean value will fall, regardless of how many times we take a random sample of a population and calculate its mean.
That is the “confidence interval (CI),” which is also the acceptable margin of error for estimating the population.
Confidence intervals are expressed as lower and upper limit values.

Confidence intervals support the reliability of the data

This confidence interval can be used to simultaneously determine the size of the treatment effect and the reliability of the study, for example, in medicine. The narrower the confidence interval, the more reliable the estimate. Recently, it is often stated together with the p-value.

Particularly in medical statistics, it is widely used in studies such as efficacy testing of therapeutic agents and treatment outcomes. It is impossible to validate the data on treatment outcomes for all patients in basic and clinical research settings. Therefore, by using confidence intervals, a small group is extracted and statistically processed to determine whether the values are reliable, thereby proving the validity of the data.

Estimating population means with specific examples

Consider the following simple example: If we take a subset of a certain total population and estimate the total, let us see whether we can have confidence in the estimated number.

[concrete example]

A prefecture has a population of 250,000 adult males. The mean (sample mean) of the echocardiographic data of n = 400 randomly selected to determine the left ventricular mass (g) of adult males in this prefecture is 135 g, and the sample standard deviation is 28 g.
Can we determine that the mean left ventricular mass of adult males in this prefecture is 135 g?

The data “n=400” for the extracted adult males may or may not be the lower or higher value in the whole (population). It is dangerous to say that the mean value of a sample that is extracted is the mean value for the whole (population).
Therefore, a certain range is assigned to the obtained mean value. In other words, we can estimate the mean value of the population by saying that the mean left ventricular mass of all adult males in this prefecture is “between 132 and 138 g.”

This method of estimating population statistics by adding a range to the survey mean is known as the “interval estimation method.”

When we say “between 132 and 138 g,” i.e., “M1-M2,” we say M1 is the lower limit value and M2 is the upper limit value. The interval between these two is called the “confidence interval.” The width of the confidence interval is mainly determined by the coefficient of the confidence interval (95%, 99%, etc.) and the sample size.

The coefficient of confidence interval is actually a concept that corresponds to the significance level in hypothesis testing. A confidence interval of 95% corresponds to a significance level of 5% (p<0.05), while 99% corresponds to a significance level of 1% (p<0.01). The closer the coefficient is to 0%, the narrower the confidence interval, and the closer it is to 100%, the wider the confidence interval.
Usually, a 95% confidence interval (95% CI) is used, and the 95% CI is a typical indicator of interval estimation.

Lower limit values = mean – 1.96 x standard error (SE)
Upper limit values= mean + 1.96 x standard error (SE)

In the specific example above, the 95% confidence interval (95% CI) is

Lower limit values = 135－1.96×28÷√400＝132
Upper limit values= 135+1.96×28÷√400＝138

From the above, we cannot determine that the mean left ventricular mass of adult males in this prefecture is 135 g. We can estimate that it is between 132 and 138 g, with a 95% confidence level.

>>Return to Tell Me About Statistics!