Document Type : Article
Authors
Center For Healthcare Data Modeling, Departments of Biostatistics and Epidemiology, School Of Public Health, Shahid Sadoughi University Of Medical Sciences, Yazd, Iran.
Abstract
Keywords
Introduction
It is costly, time consuming and beyond the ability of a researcher to collect information from the whole population; therefore, the best solution is to follow efficient sampling procedures. Sampling has several advantages and the most important one is saving time and money. It also requires less equipment and facilities. It can be said that the most important problem of sampling is the possibility of bias [1].
To achieve an ideal and suitable sample, the following points are need to be considered:
1. A clear definition of the statistical population
2. The format and framework of sampling
3 Characteristics of the statistical population to generalize the findings
4 Logical sample size
5 Adoption of appropriate sampling techniques
6 The rate of nonresponse according to the sample size [2].
Before introducing the types of probabilistic sampling, statistical population and the sample are elaborated below.
Statistical population is a term used by epidemiologists and statisticians to describe a large set of objects or people who share a common characteristic.
Target population: This informal term as a whole means a subset of the statistical population considered only for study, but sometimes in the literature this term is mistakenly used instead of the term statistical population [3].
A sample is a subset of the statistical population, which is selected to represent the statistical population.
External validity: External validity means the ability to generalize the results of the sample to the statistical population from which the sample was taken. This criterion is highly sensitive to participant selection [3]. In addition to the concept of external validity, there is another concept called internal validity, indicating that the results and relationships discovered are reliable and are not affected by confounding factors.
Researchers choose the type of sampling method based on the purpose of their research and the capabilities of the statistical population. If the purpose of sampling is to estimate population parameters, it is necessary to use probabilistic sampling [3].
In general, sampling techniques are divided into probabilistic and nonprobabilistic methods. Given sampling is an important issue in medical research,
it is elaborated in plain English. Basic statistics
terms are initially explained followed by four types
of probabilistic sampling, namely, simple random sampling, stratified random sampling, cluster sampling, and Systematic sampling. In the following, the subject of nonprobabilistic sampling and its types are dealt with. Figure 1 reveals the types of sampling methods and Table 1 shows the strengths and weaknesses of each method.
Figure 1. All types of sampling methods
1 Probabilistic Sampling methods
As a reliable sampling technique, it offers equal chance for individuals and objects to represent the whole population. [4]. When the base population is homogeneous, the chances of selecting each member in the sample are quite high [1].
11 Simple random sampling (SRS)
In this sampling method, each member of the statistical population has an equal chance to enter the sample or be selected in the study. The correct method recommended for selecting a simple random sample includes the use of a random number table or generated random numbers using statistical software [1, 4].
There are two types of simple random sampling [1]:
1. Simple random sampling with replacement (SRS+R)
2. Simple random sampling without replacement (SRSR)
In the SRS with replacement, each member has the chance to be reelected more than once; while in random sampling without replacement, each member has a chance to be selected only once.
Example: The researcher intends to study the clinical outcomes of COVID19 in patients hospitalized in Shiraz hospitals, Iran. Therefore, the target population is patients hospitalized with COVID19. The number of these patients is 5,000 and the sample size obtained is 500 patients. Therefore, the chance of choosing each patient is onetenth. If each patient in the population is returned to the statistical population, they will have a chance to be reselected, which is called SRS with replacement (SRS+R). Also, if any patient in the sample is removed from the statistical population after selection, the patient will not have a chance to be reselected, which is called simple random sampling without replacement (SRSR).
Advantages of SRS method include minimization of the selection bias and simplification of the analysis. It is easy to estimate the accuracy obtained from the sample, but care needs to be taken to avoid sampling error; since random selection may lead to a sample that does not represent the statistical population. Moreover, this sampling method is not suitable for research aimed at subgroup analysis and may provide biased estimators [4].
12 Stratified random sampling (STRS)
Stratified random sampling is a probabilistic sampling method in which the population is first divided into independent subgroups (strata) that have members with identical characteristics in each strata, and then a random sample is selected from each strata [1, 5].
This method is suitable in studies where the researcher intends to compare different subgroups. Therefore, in this type of classification, the quantities based on which they are classified (such as age group) should be related to the desired characteristics and attributes to ensure sampling efficiency. In addition, it is more dynamic than simple random sampling; since it requires fewer samples [4].
STRS advantages
1 This method is more efficient than other sampling methods,
2 It is easier to use
3 It needs a smaller sample size [1].
Example: Suppose the heterogeneous statistical population of a university students. Sampling can be performed based on the faculty as strata. If sampling is done as a proportion of the of students of each faculty from all university students, then this type of stratified sampling is called probability proportional to size (PPS) sampling; otherwise, sampling is called disproportionate.
13 Cluster random sampling (CRS)
Cluster random sampling is the result of dividing a population into similar clusters and heterogeneous populations within each cluster and then randomly selecting several clusters or groups. This sampling method is a twostep process [4]. In cluster sampling, the unit of measurement is a group of individuals (or clusters) that are naturally formed. This method is used when there is a massive population. Therefore, to save time and money, we can focus on clusters instead of people [6].
CRS is divided into onestage and multistage according to the method of implementation. In onestage cluster sampling, clusters are selected only randomly among all elements. Multistage cluster sampling is an extended form of singlestage cluster sampling, so that it is carried out in several stages.
Example: Suppose we want to examine the level of education of the employees of a city because this statistical population is heterogeneous and scattered. First, the employees of each organization are considered as a cluster, and then among all organizations, several organizations (clusters) are randomly selected. Several units are extracted from each organization and then from the organizational units, the sample is randomly selected from each unit.
14 Systematic random sampling (SYRS)
The main idea of SYRS is to regularly select a number of members on a list. Suppose we want to select names from a long list. A simple way is to select the appropriate distance (systematic sampling interval or sampling fraction) and select the names at equal intervals along the list. Systematic random sampling is often easier to perform than simple random sampling and also reduces the possibility of error [7]. This sampling method is more advanced than the simple sampling method and gives the researcher more detailed information about the statistical population.
Steps of sampling in SYRS method are as follows: selecting the statistical population, selecting the sample size, assigning a value to each member of the sample, determining appropriate distance between the sample (sampling fraction), choosing a random starting point, and finally starting the sampling systematically .[1]
For example, suppose we want to systematically select a random sample with size 8 from a statistical population with size 24.
First all the population units are numbered, then the sampling fraction (SF) is calculated by dividing preset sample size (n) by the size of the population (N),(SF=8/24=3). In the next step, the number of the first unit of sample (random starting point) will be randomly selected from 1 to 24, then the number of the next units is obtained by adding SF. For example, if the number of the first unit of the sample is 2, the number of the next units will be 5,8,11,14,17,20,23.
15 Multistage random sampling (MSRS)
Depending on the characteristics of the community and the purpose of the research, the desired sample is sometimes selected in several stages. In this method, a sample of n1 is selected from a community of N people. Then, a smaller group is selected from n1; hence twostage, threestage, … and m stage can be define accordingly. This random sampling is suitable when the statistical population is too large [1].
Example: It may be necessary to sample the population of a city in a study. By considering multistage sampling, one can first select several regions from among urban areas through cluster sampling. Then, within each region select several neighborhoods by systematic random sampling method. Finally, through simple random sampling, one can select several building blocks and homes.
2 Nonprobabilistic sampling (NPS) methods
In NPS, the chances of selecting people or objects are not the same. In this type of sampling, there is a selection bias because the selection of individuals by the researcher is conscious and arbitrary. Therefore, estimating the sample error will be difficult due to nonrandom selection of individuals and it is impossible to obtain an accurate estimate of the sample error. Moreover, the results obtained from the sample cannot be generalized to the total population, because the obtained sample does not represent the community. NPS is cheaper and easier than probabilistic sampling, but the results are of lower validity than probabilistic sampling [4, 8].
21 Convenience sampling (CS) (Grab /accidental / opportunity sampling)
Convenience sampling is a type of NPS in which the researcher studies the part of the statistical population that is more accessible. The results of this type of study cannot be attributed to the statistical population; since the sample size is not a true representative of the population. It is also suitable for the initial pilot study [4]. In general, this type of sampling method is preferred by researchers, because it is easier and cheaper than other sampling techniques [5], though it has some weaknesses, such as the selection bias. Therefore, samples do not represent the statistical population, and it is not recommended for descriptive and causal studies [9].
22 Quota sampling (QS) (Nonprobabilistic version of STRS)
In the QS, first the population is divided into separate subgroups, and then individuals are selected from the subgroups based on the proportion to size sampling. In this type of sampling, sample selection is nonrandom and often unreliable. Moreover, there is a selection bias error; since everyone in the community does not the same selection chance, making it the least popular sampling method. Quota sampling is usually used when time is short, there is no sampling framework, study budget is limited, and sampling accuracy is not a matter. In QS, it is possible to determine how many people are selected from each subgroup. The researcher places people in demographic groups based on age and gender variables. When the quota of a demographic group is saturated, sampling is completed[4]. In terms of strengths, it should be noted that in this type of sampling method, the sample can be controlled with certain characteristics. Selection bias is also its main weakness [9].
23 Snowball sampling (SS) (Coldcalling, chain/chainreferral / referral sampling)
In statistical and epidemiological research, SS is a technique, in which the study group introduce other people, the study group grows like a rolling snowball. This sampling technique is often used in secret populations that are difficult to access, like drug addicts. This method has many limitations, because it does not have a specific framework and is time consuming [9, 10].
24 Judgmental sampling (JS) (purposive/ authoritative sampling)
JS is also called purposive sampling and is selected nonrandomly to achieve a specific goal. In this method, the researcher preferably uses people who have the necessary experience and knowledge in the field, so in JS, the researcher studies people who thinks are more suitable for study. This method is used when the number of people with a particular experience is small. It is also used in pilot studies. This type of sampling method has almost the same disadvantages as the convenience sampling method; therefore, the results are not of high validity [1, 4]. In terms of strength, it should be noted that JS has low cost, is easy, and is not time consuming, hence suitable for exploratory research projects. In this method, sampling is carried out usually by experienced researchers, and the results obtained in this method cannot be generalized to the statistical population[1, 9].
Table 1: Pros and cons of sampling methods
Sampling methods 
Pros 
Cons 
Probabilistic Sampling 


Simple random sampling 


Stratified random sampling 

· Only used in population with exist underlying classification· Finding list of population units may very difficult and impossible· Risk of overlapping with common member in two strata 
Cluster random sampling 
· Managing variability· Feasible method with no need to sampling frame 
· Risk of selection bias and sampling error 
Systematic random sampling 


Multistage sampling 
· Flexible 

Nonprobabilistic Sampling 


Convenience sampling 


Quota sampling 


Snowball sampling 


Judgment sampling 

Conclusion
Determining the type of sampling method is one of the important components of medical studies. Therefore, researchers are expected to improve knowledge of sampling methods and how to use them (random and nonrandom). As a result, they have more mastery in choosing the type of sampling method in their studies. The sampling method and its logic is directly related to external validity, so that the external validity will decrease if the researcher makes a mistake in choosing the sampling frame or selecting participants. Therefore, researchers should also pay attention to external validity. In sampling, both internal and external validity should be high. External validity is highly sensitive to selection bias, so in
nonprobabilistic sampling the external validity is much lower than in probabilistic sampling [3]. Furthermore, choosing the right sampling method can reduce time and cost and enhances quality of research.
Acknowledgements
We would like to thank, Yazd University of Medical Sciences, especially the school of Health, which by holding scientific workshops, Encourages us to publish medical research articles.
Authors’ contribution
Conceptualization: Madadizadeh F
Formal analysis: Basti M, Madadizadeh F
Investigation: Basti M, Madadizadeh F
Methodology: Madadizadeh F
Supervision: Madadizadeh F
Writing – original draft: Basti M, Madadizadeh F
Writing – review & editing: Basti M, Madadizadeh F
Funding source
This research received no specific grant from any funding agency in the public, commercial, or notforprofit sectors.
Conflict of Interest
There were no conflict of interest.