Sample Pilot Study: The Effect of Health on Personal Income

Background

This pilot study is conducted for a PhD dissertation. The aim of the dissertation is to find out whether people’s health affects their income. The methodology of the study is characterised by a mixed-method research design with the analysis of primary data. The data is collected using a questionnaire survey that contains both close-ended and open-ended questions. Close-ended questions are used because they allow for quantifying the responses and exploring the data using statistical methods whereas open-ended questions are used for a qualitative analysis of the role of health in the level of personal income. The original questionnaire applied in the pilot study and the edited version after the pilot study are provided in Appendix.

Procedure

The pilot study covered 9 respondents with different sociodemographic characteristics. The respondents for the pilot study have been selected using non-probability convenience sampling.  All questionnaires were distributed using online channels. In particular, 3 questionnaires were sent by email, 3 questionnaires were sent by WhatsApp and the other 3 questionnaires were sent to respondents by Facebook. Respondents were asked to participate in the survey in advance in order to ensure that they can do this. Each message contained a request to participate in the survey and a note that the obtained information will be employed for research aims and will not be used for other purposes.

The processing of the questionnaires and analysis of answers took three days but a larger scale survey is expected to longer. Moreover, in the large scale survey the response rate is expected to be much less than 100%.

Five respondents were males and four respondents were females. Three respondents were aged between 18 and 29 years, five between 30 and 44 years and one respondent was aged between 45 and 65 years old. As for income, two respondents had an annual income below £50,000; four had an income between £50,000 and £99,999; monthly income of two respondents was between £100,000 and £149,000; and income of one respondent was over £200,000.

Three respondents trained in a gym or did other sports whereas six participants did not do any sports. Among those who trained in the gym, one respondent usually did it once a week while the other two trained 2-3 time a week. Two of the polled did morning exercises at home while seven others did not. Five respondents disagreed with the statement that their food was healthy, two were neutral about this statement, one respondent opined that his food was healthy while another one strongly agreed that his food was healthy. Five respondents were not ready to spend more than £50 for maintaining healthy lifestyle. The spending of three respondents was between £50 and £100 while another one spent between £100 and £299. Finally, one respondent disagreed with the statement about their healthiness, five respondents were neutral while three opined that their health was good.

The obtained results were insufficient to run a full-scale regression analysis and obtain unbiased outcomes. Therefore, it is difficult to estimate the appropriateness of questions for running regression analysis and assessing the relationships between health variables and income quantitatively. On the other hand, a set of notions and recommendations can be provided for improving the design of the main study.

Recommendations for Improvement

First, the sample size and respondents should be selected using probability or random sampling techniques. For a more comprehensive analysis, it looks to be more appropriate to use multi-stage sampling. On the one hand, the population for the survey contains all people who have income in the country. The required number of respondents could be chosen randomly from this population. On the other hand, the sample should contain people with different income and different level of health. Health strongly depends not only on income but on age and lifestyle as well. Therefore, stratification by these parameters is required. Thus, when the population is stratified, it is appropriate to use probability sampling which implies that respondents in each stratum will have equal chances to be selected in the sample.

As for the sample size, it should be increased significantly to obtain representative results. It is also important to ensure that the surveyed participants provide internally consistent responses that do not contradict each other. This can be checked by applying Cronbach’s alpha as a measure of internal consistency. This indicator reflects how well various proxies explain the same latent factor. It takes the values from 0 to 1 and the higher the value, the higher the internal consistency of the responses. Ideally, Cronbach’s Alpha should be in excess of 0.8. This would indicate that a particular variable is measured reliably based on the participants’ responses.

Whereas there is no formal requirement for an optimal number of survey participants, it must be ensured that their number is significantly greater than the number of questions asked. Thus, the final sample size is recommended to be in the range from 100 to 200 people at least. The more the better. However, it is also important to consider the expected response rate when targeting a sample of 100-200 people. In order to obtain this sample, assuming there is a 50% response rate, some 200-400 people have to be surveyed.

The required amount of time for analysing a full sample can be projected as follows. It took two days to collect answers from the participants. Collecting answers for the total sample may take more than a month. Along with that, data coding and analysis of the pilot study sample took two days whereas the time required to make computations for a large sample will not increase proportionally.  It is expected that coding and analysis of 100-200 responses will take around a week.

Taking into account the required number of participants, it would be more convenient to conduct a survey on a special platform such as Google Forms, Survey Legend or Typeform. These platforms do not have limitations on the number of questions and number of respondents involved. The popular platform Survey Monkey allows for creating a survey consisting only of 10 questions in a free version which is insufficient for the aims of the main study. Another advantage of using such platforms is that the results can be imported in Excel or Google Sheets. It makes the process of quantification and statistical analysis of results much faster and allows the researcher to save time. Thus, the mechanism of informing potential participants should be the following. When the survey is created on a platform, links to a survey page will be sent to target respondents. The link can also be shared on social networks.

Another observation from the pilot study is that respondents rarely answer open-ended questions. It can be explained by several reasons. First, close-ended questions are much faster to answer. It takes only a click on the relevant option in the multiple choice box. Meanwhile, open-ended questions imply that a respondent should formulate their opinion or position which takes more time, efforts and creativity. Thus, people may simply think it is redundant or try to save time. Second, respondents may not desire to reveal details of their life asked in open-ended questions. Third, they may not have a clear opinion on a particular question unless options for answers are provided. Implications for the researcher may be the following. One solution may be to merely exclude open-ended questions from the survey. This would not allow for deepening understanding of the topic and contributing to the interpretation of close-ended questions. On the other hand, this would save researcher’s time since analysing the answers to open-ended questions is more time-consuming compared to assessing answers to close-ended questions. Another solution would be to reformulate open-ended questions in close-ended forms. Information from one open-ended question can be obtained from asking either one or several close-ended-questions. This strategy is based on expectations that respondents answer close-ended questions more willingly. However, this could lead to an unreasonably large number of questions in the survey, which should also be avoided.

The pilot study showed that the questionnaire has not suffered the problem of too many questions. On the contrary, the size of the questionnaire can be increased by means of including additional questions that reflect other aspects of health. Scarce answers to open-ended questions revealed that other factors can also influence health and income. These include mental health and the level and frequency of stress at work, frequency of vacations and travelling, and compliance with the regime of the day. The variables representing these factors should also be included in the model. On the other hand, including too many variables in the model may entail the problem of overfitting or even multicollinearity.

An issue of interpreting the results may arise in the main study. The matter is that the revealed relationships between the variables reflecting different aspects of health and income should be explained in line with the economic sense of these variables. Income is represented by the ordinal variable which is the variable that lies on a scale where only relative ordering between its values is important. Thus, a simple OLS regression would be inappropriate, and an ordinal regression analysis should be used instead. Also, endogeneity may be present in the sample and this problem needs to be checked. If a linear model is employed, it needs to be tested whether the relationship between the explored variables is monotonic and linear. In case of a non-linear association alternative specifications should be employed.

If the chosen independent variables reveal a high degree of multicollinearity, this problem can be dealt with in several ways. First, one of the correlated variables can be omitted. Second, a principal component analysis (PCA) could be run to replace collinear factors with a single variable. However, one should keep in mind, that associations between the variables measured by regressions do not imply causal relationships. Causality needs to be deduced based on an appropriate theory. Thus, a good regression model will be the one backed by a solid theory or previous empirical evidence.

 

Appendices

Appendix A: Original Questionnaire

  1. Gender

1) female       2) male

  1. Age

1) <18       2) 18-29       3) 30-44        4) 45-65       5) 65+

  1. Income per month

1) <£50,000    2) £50,000 - £99,999   3) £100,000 - £149,999    4) £150,000 - £199,999   5) >£200,000

  1. Do you visit a gym or other classes (yoga, stretching, swimming pool)?

1) Yes     2) No

  1. If the answer for question 4 is Yes, how many times a week?

1) Occasionally   2) 1     3) 2-3    4) 4 and more

  1. Do you make morning exercises to be healthy?

1) Yes     2) No

  1. Do you agree that the food you eat is healthy (1 is strongly disagree, 2 is disagree, 3 is neutral, 4 is agree, 5 is strongly agree)?

1) 1        2) 2         3) 3          4) 4         5) 5

  1. How much money per month are you ready to spend for maintaining health? (This implies that you consider these spendings as health-maintaining, such as sports and other classes, massage, special procedures, healthy food)

1) <£50      2) £50 - £100         3) £100 - £299    3) £300 - £500     3) >£500

  1. How would you estimate your health on a 5-point scale where 1 is bad, 2 is weak, 3 is mediocre, 4 is quite good, 5 is very good?

1) 1        2) 2         3) 3          4) 4         5) 5

  1. What is your attitude to healthy lifestyle?

_____________________________________________________________________

  1. Do you perceive health as a factor contributing to high income?

_____________________________________________________________________

  1. What other efforts do you make to stay healthy?

_____________________________________________________________________

 

Appendix B: Improved Questionnaire

  1. Please, choose your gender:
  • Female
  • Male
  1. What is your age?
  • under 18
  • 18-29
  • 30-44
  • 45-65
  • over 65
  1. What is your average annual income?
  • Less than £50,000
  • £50,000 - £99,999
  • £100,000 - £149,999
  • £150,000 - £199,999
  • More than £200,000
  1. Do you visit a gym or other classes (yoga, stretching, swimming pool)?
  • Yes
  • No
  1. How often do you exercise in a gym?
  • Not applicable
  • Less than once a week
  • Once a week
  • Twice a week
  • Three times a week
  • Four times a week
  • Five times a week
  • Six times a week
  • Every day
  1. Do you do morning exercises?
  • Yes
  • No
  1. To what extent do you agree that the food you eat is healthy?
  • Strongly disagree
  • Disagree
  • Neither agree nor disagree
  • Agree
  • Strongly agree
  1. How much money per month do you spend on improving your health?
  • Less than £50
  • £50 - £100
  • £100 - £299
  • £300 - £500
  • More than £500
  1. How would you describe your current health?
  • Very bad
  • Bad
  • It’s ok
  • Good
  • Excellent
  1. What is your attitude to healthy lifestyle?

__________________________________________________________________________________________________________________________

  1. Do you perceive health as a factor contributing to high income?

__________________________________________________________________________________________________________________________

  1. What else do you do to stay healthy?

__________________________________________________________________________________________________________________________