This assignment constitutes 10% of the Unit’s total assessment and consists of 3 questions.
All relevant Excel outputs must be labelled clearly and provided as Appendices to assignment answers to support your explanations. Please make sure that a graph, table or Excel output for a model appears on a single page and not divided between pages.
You may annotate Excel outputs for clarity by inserting textboxes.
The data file, ETB2111-Ass2017_Dat.xls, for this assignment is available in Assignment Block on the home page of ETW2111 on Moodle.
Submit a hard copy of this assignment with a signed cover sheet; DO NOT SUBMIT DISCS or USBs.
ASSIGNMENTS WITHOUT SIGNED COVER SHEETS WILL NOT BE ACCEPTED.
Arrangements for assignment submissions and collection of marked assignments will be announced by your local lecturer.
Question 1 (2+4+ 10 = 16 marks)
The Big Max store manager has collected data on age and amount spent in a single transaction by 1000 of its customers. The manager wishes to determine if the amount spent in a single transaction by customers is related to their age. He has classified customers into 3 age categories of ‘Young: £ 30 years’, ‘Middle aged: 31 to 55 years’ and ‘Older: > 55 years’. He uses four categories for amount spent in a single transaction, and defines them as: ‘1 = Casual purchase: amount spent £ $148’, ‘2 = Small purchase: amount spent between $149 and $299’, ‘3 = Medium purchase: amount spent between $300 and $ 553’ and ‘4 = Large purchase: amount spent ³ $554’. The data are given in file ETB2111-Ass2017_Dat.xls. Use Excel to analyse data and answer the following questions.
Report contingency table displaying observed frequencies from cross tabbing ‘Age’ and ‘Amount spent category’.
Find the percentage of Young, Middle aged and Older customers in each amount category. Explain, if any, the pattern depicted by these percentages.
Using the critical value approach, can we conclude at the 5% level of significance that the amount spent in a single transaction by customers is related to their age? Your answer must include the following.
Appropriate null and alternative hypotheses
Details of the analysis leading up to the value of the test statistic, and the distribution followed by the test statistic
Question 2 (2 + 2 + 3 + 6 + 5 = 18 marks)
A sporting goods manufacturing company wanted to compare the performance of four designs of golf balls in terms of the average distance travelled by balls. The company approached a professional golfer and gave him four different designed golf balls for testing. The pro was not told of the type of ball that was being hit. All golf balls were hit in a short period of time so that the environmental conditions did not affect the performance of the balls. The distances travelled in metres by each ball are given in the table that follows.
These data are also provided in file ETB2111-Ass2017_Dat.xls on Moodle.
Name a statistical procedure that will facilitate the company make comparison of the four designs on the basis of average distances travelled by the golf balls. Using Excel, run this procedure on the given data.
On the basis of average distance travelled by golf balls of each design, which design seems to be the best in producing balls with long drives? Explain your reason.
Would you consider design 2 to be better than or at least comparable to design 1? Explain the reason for your choice of the design.
Can we conclude at the 1% level of significance that there is a significant difference in the average distance travelled by balls of the four designs? Your answer must include the following:
Appropriate null and alternative hypotheses
The value of the test statistic used and the distribution this test statistic follows
The decision rule based on the p-value
Conclusion about the average performance of four designs of balls in terms of average distance travelled.
If the answer in part (d) indicates the existence of significant differences in average distances travelled by four types of balls, apply an appropriate test to determine which pairs of ball designs produce statistically different average distances. Use a 5% level of significance and show all details.
The US Bureau of Labour and Statistics provides data on year-to-year percentage changes in the wages and salaries of workers in private industries, along with “White-collar” and “Blue-collar” occupations. Data on percentage changes in wages of workers in private industries and “Blue-collar” occupations for the years 1980 to 2001 are provided in file ETB2111-Ass2017_Dat.xls. It is believed that the changes in wages of ‘Blue-collar’ occupations are dependent on the wages of workers in the private industries.
Obtain a scatter plot of percentage changes in wages of Blue-collar and private industry workers to check if the two variables are related. Explain the type of dependence relationship.
Fit a linear regression model for predicting the percentage change in the blue-collar workers wage from the percentage change in the private industry wages. Report the estimated regression equation linking the two variables.
Interpret the intercept coefficient value obtained in part (b) model. Is the intercept value meaningful in the current situation?
Report information on three measures of performance for the linear model fitted in part (b) that validates that the fitted model is satisfactory, and that the linear relationship is statistically significant.
Can we conclude at the 5% level of significance that 1% increase in private industries wage implies 1% increase in Blue Collar occupations wage? Show all relevant details to support your answer.
Predict the average percentage change in wages of blue-collar workers in a year when the percentage change in private industry wages is 3%, and obtain a 98% prediction interval manually showing all details. Is this prediction valid? Comment.
Now revisit the scatter plot obtained in part (a) answer. Does this scatter plot indicate that the fitted model in part (b) was a good choice? Comment.
Fit an appropriate (improved) model to the given data, record residuals and report the new model.
Comment on the enhanced performance of this new model. Are there any concerns with this model? Discuss briefly.
Does the dependence relationship between the percentage change in blue-collar workers wage and the percentage change in private industry wages depicted by the enhanced model agree with your impression formed on the basis of the scatter diagram in part (a)? Explain.
Do residuals of part (h) model seem to satisfy the homoscedastic assumption of regression model? Show relevant details to support your answer.
"Is this question part of your assignment? We Can Help!"