Question? Leave a message!




How can a quality Research improve the Educational standard

The relationship between research quality and teaching quality. and how can a quality research improve educational standard
Dr.CherylStam Profile Pic
Dr.CherylStam,New Zealand,Researcher
Published Date:04-07-2017
Website URL
Comment
CPB Discussion Paper 347 Are good researchers also good teachers? The relationship between research quality and teaching quality Ali Palali Roel van Elk Jonneke Bolhaar Iryna RudAre good researchers also good teachers? The relationship between research quality and teaching quality ∗1 1 1 2 Ali Palali , Roel van Elk , Jonneke Bolhaar , and Iryna Rud 1 CPB Netherlands Bureau for Economic Policy Analysis 2 The Top Institute for Evidence Based Education Research, TIER, Maastricht University March 15, 2017 Abstract We investigate the relationship between research quality and teaching quality using data from Maastricht University, the Netherlands, where students are randomly allocated to dif- ferent teachers within the same course. We measure research quality by the publication records of the teachers and teaching quality by both student evaluations of the teachers and final student grades. We find that being taught by teachers with high quality publications leads to higher grades for master students. This is not fully reflected in the student evalu- ations of teachers. Master students do not give higher scores to teachers with high quality of publications, bachelor students give lower scores. Keywords: research and teaching, student grades, teacher evaluations JEL codes: I23, I28 ∗ Corresponding author. Email: a.palalicpb.nl1 Introduction There is a continuous discussion, both academic and public, on how research and educa- tion in universities are related (Hattie and Marsh (1996); uz Zaman (2004); Jenkins et al. (2007); Elken and Wollscheid (2016)). It is often questioned whether a ‘good researcher’ implies also a ‘good teacher’, whether teachers who conduct research are more effective than those teachers who do not do research, and, in general, whether research and teach- ing activities can complement each other. Even though answers to these questions are important for university stakeholders to find the most efficient way in distributing human resources between research and teaching activities, the current empirical evidence is lim- ited and mixed. Furthermore, evidence on the relationship between research and teaching is important for providing a better insight into the effective production of research output and student learning outcomes. This study aims to analyse the relationship between research quality and teaching quality. We use individual-level data from the School of Business and Economics at Maastricht University in the Netherlands, where students are randomly allocated to dif- ferent teachers within the same course. At the end of the course, students take the same exam. This enables us to exploit the exogenous variation in research quality of teachers on student outcomes such as grades and student evaluation scores. We measure research quality by the publication records of the teachers. Our results show that master students who are taught by teachers with high quality publications score higher grades. We do not find any effect for having any publications or total number of publications. Therefore, quality seems to matter rather than the quantity. The results on student grades are not fully reflected in how students evaluate their teachers. Master students do not give higher scores to teachers with higher number of publications or higher quality of publications. Bachelor students give lower scores. Empirical evidence on the relationship between research quality and teaching quality obtained from using data on randomized experiments is scarce. Our study contributes to the existing literature by using rich individual-level data on students who are ran- domly assigned to different teachers within the same course. The second contribution of 2this study lies in exploring and comparing two different measures of teaching outcomes: student evaluations of the teachers and final student grades. This paper is organized as follows: In section 2, we discuss possible mechanisms under- lying the relationship between research and education. In section 3, we discuss previous literature on the relationship between research quality and teaching quality. In section 4 we provide an overview of the higher education system in the Netherlands and of the Maastricht University, in particular. Section 5 describes data and presents descriptive statistics. In section 6, we describe our empirical strategy. In section 7, we discuss our estimation results. Finally, section 8 concludes. 2 Mechanisms linking research quality and teaching quality The link between research quality and teaching quality is complex and multidimensional. Based on the previous literature, we distinguish several main mechanisms that can underlie this relationship. Depending on which of the mechanisms dominates, this relationship can vary from positive to null, and even to a negative one. The first type of mechanisms suggests a positive relationship between research and teaching via complementarity between skills (uz Zaman (2004)). Conducting research can both enhance teacher’s proficiency in the subject and keep the teacher up-to-date with regards to the newest developments in the discipline. As a result, research activities would have a positive impact on teaching quality. Such skills transfer can operate not only at the level of teacher but also at the teacher-student level. For example, through involve- ment in teaching activities and interactions with students during classroom discussions, researchers can transfer their critical thinking and research skills to students. The second set of mechanisms suggests a negative relationship between research and teaching. Both research and teaching activities require investment of time and effort. Being involved in one activity, for instance, the process of conducting research, usually does not allow for simultaneously spending time and effort on another activity (the pro- cess of teaching), unless one activity benefits both research and teaching (e.g. reading 3a scientific paper can simultaneously contribute to research ideas and to teaching prepa- ration). Time and effort allocated to teaching and research are also influenced by the system of incentives in academia. Research can be rewarded by universities through pro- motion more generously than teaching. Therefore, there can be a selective inflow into the profession, or people in academia might choose to prioritize research over teaching, and they might be more likely to build career in academia by doing research, while teaching is often regarded as “punishment” (Walstad and Allgood (2005); Cretchley et al. (2014); De Philippis (2015)). Furthermore, contrary to the first mechanism teaching and research might require different set of skills. If research requires more specific skills (e.g. synthesis, deduction) than teaching (e.g. communication, mentoring), this can lead to disparities between skill transfers. Hence, the relationship between research output and teaching ef- fectiveness might be neutral or even negative. Which one of these mechanisms dominates the others is an empirical question. 3 Previous literature The literature on the relationship between research quality and teaching quality is rather large. However, most of these studies focus on the correlation between the two without making causal claims. Below we give an extensive review of the literature by first focusing on the research and teaching quality measures and then focusing on the findings between these two measurements. Since the liteture is extensive, we present some of the relevant 1 papers in Table 1. 3.1 Measures of research output and teaching quality from the existing literature It is generally not straightforward how to measure research quality and teaching qual- ity. Even if the research quality can be to some extent observed and summarized based 1 Studies are not included to this table if they are (a) published earlier than 1980; (b) descriptive (non- empirical); (c) analyze exclusively the link from teaching to research output; (d) based on the analyzes of teachers(students) believes(views) on the research-teaching relationship; (e) the number of observations in the analyzes is lower than 20. 4on (produced) research outcomes, teaching quality cannot be directly observed. In our study we measure the research quality by publication records of the teachers. We have information on not only how many publications that a teacher had in a certain year, but also if these publications appeared in A, B or C level journals. Therefore, we can differentiate between quantity of publications and quality of publications. For teaching quality we use two different measures: student evaluation of the teachers and student grades. Whereas student evaluations capture how students perceive teaching quality of their teachers, student grades capture the final learning experience. We are aware of the fact that there are several different measures prosposed in the literature. Research quality has been traditionally measured by the number of published and refereed articles (see e.g. Gottlieb and Keith (1997)), by citation scores (see e.g. Rothman and Preshaw (1975)), by impact factor (Saha et al. (2003)) and by the combined measures of quantity and quality of publications (Lanjouw and Schankerman (2004); 2 Hirsch (2005); Bornmann et al. (2008) develop an h-index to characterize the scientific output of researchers, based on both the number of published articles and the impact of these publications). However all of these measures have been criticized and there is no consensus in the literature on which measure of research quality should be considered as universal, but rather the choice of the measure depends on the particular research goals of the study. In our study, we will use publication records to construct different research quality measures that vary in emphasis on quantity and quality of publications. Teaching quality is related to both teacher’s performance and student learning out- comes. Student evaluations of teachers are the most frequently used measure to estimate teaching quality in higher education (Becker and Watts (1999); Becker et al. (2012)). This popularity is mainly explained by the availability of the data on student evalua- tions. This measure, however, has been increasingly challenged in the literature since it is based on the perceptions of respondents and it might not necessarily reflect true teach- 2 Even though it is much more complex than the other measures, h-index also has been criticized in later studies for non-completeness. Several researchers, for example, argued that this index does not account for aging of citation Sidiropoulos et al. (2007); Glanzel (2006); Burrell (2007). At the same time, alternatives to the h-index, such as a g-index (Egghe (2006)), an a-index (Jin (2006)) and an ar-index (Jin (2007)), hardly overcome all drawbacks of the h-index (Bornmann and Daniel (2007); Bornmann et al. (2008)). 5ing effectiveness. In particular, students often evaluate the teachers on the basis of how they enjoyed the course and on the basis of teachers’ personal characteristics, and not necessarily characteristics related to teaching quality (Braga et al. (2014b); McPherson et al. (2009)). Evidence shows that student evaluations are less biased in the populations where high-skill students are over-represented (Braga et al. (2014a)). Another concern is that students are usually not obliged by the institution to provide evaluations, and they are not randomly selected, which leads to a biased assessment of teacher quality. In other words, students who eventually fill in evaluation forms represent a selected sample of all students. Salomons and Goos (2014) quantify the direction and size of selection on both observable and unobservable characteristics of students, teachers and courses. They find that the true evaluation score is lower than the average reported, and thus the selection bias is positive. Moreover, they also conclude that taking student evaluation is not advisable when response rates are low or vary considerably across courses. Hoffmann and Oreopoulos (2009) suggest using the mean of the averages for teacher evaluations across classes, as this ensures that teacher quality measures differ only when instructors differ. Finally, Emery et al. (2003) and Becker and Watts (1999) advise not to use student evaluations as the only measure of teaching quality. The alternative to student evaluation of teachers is actual student grades which are directly informative about student learning. However, using student grades in empirical analyses is not without problems either. Some studies point at the fact that teachers can inflate grades for the purpose of elevating student evaluations (see e.g. Krautmann and Sander (1999); Johnson (2003); Carrell and West (2010)). This tendency is related to characteristics of departments and teacher-specific characteristics (Jewell et al. (2013)), whereas teacher-specific characteristics explain relatively much more variation in grade inflation. Jewell et al. (2013) explain this by the universal tendency of the universities to use student evaluation scores as inputs into tenure and promotion decisions, and therefore teachers are likely to inflate grades rationally. This causes many universities to collect student evaluations before final exams. Another criticism about using student grades is that student performance can be influenced by different characteristics of students, not related to teaching effectiveness (Berk (1988, 2014)). Keeping these criticisms in mind, 6we use both student evaluations and student grades as teaching quality measures. By doing so, we also shed more light on the differences between the two measures. 3.2 Existingevidenceontherelationshipbetweenresearchqual- ity and teaching quality The existing literature on the relationship between research and teaching is primarily lim- ited to correlational studies. From an extensive review of empirical literature, uz Zaman (2004) concludes that the correlation between research and teaching varies from -.4 to +.8. This broad range of findings can be explained by different measures of research quality and teaching quality, by differences in applied empirical strategies and by a variety of exogenous and endogenous factors influencing this relationship, such as discipline or the ability level of students. More recent studies extend the previous correlation literature by controlling for different educational settings (e.g. discipline, institution type, student group size, level of studies), characteristics of teachers (e.g. age, academic rank), and characteristics of students (e.g. gender, ability of students) (see e.g. Zamorski (2002); Bettinger and Long (2005); Arnold (2006); Cherastidtham et al. (2013)). Nevertheless, evidence obtained from these studies is mixed. The relationship between research quality and teaching quality can differ across coun- tries and educational systems. Whereas the vast majority of previous research on the relationship between research quality and teaching quality has been conducted for the United States, there is recently a growing empirical evidence on this relationship from other countries, such as Korea, Italy, the Netherlands, and Australia (Cherastidtham et al. (2013); Braga et al. (2014a); Arnold (2006); Bak et al. (2015)). For the Nether- lands, Arnold (2006) examines the relationship between research quality and teaching quality at the Faculty of Economics, at Erasmus University of Rotterdam. He creates a measure of research quality based on the information whether academic staff meets the criteria for a research fellowship of the graduate school and research institute. He measures teaching quality by student evaluations of teachers. The study finds a negative correlation between research quality and teaching quality for the first and second year 7bachelor courses, while this relationship is positive for the third year bachelor courses and for the master courses. Despite controlling for different observable factors influencing research and teaching and accounting for potential non-linearity of the relationship, most of studies on the relationship between research and teaching suffer from endogeneity problems, in particular due to selection issues (i.e. self-selection of teachers to research and teaching activities and self-selection of students to different teachers). In the recent years, more data on random assignment of students to different teachers in higher education have become available and enabled researchers to analyze different aspects in higher education (see Carrell and West (2010); Braga et al. (2014b,a); Feld and Zulitz (2016)). However, causal research on the relationship between research quality and teaching quality is still scarce. The only exception is a study by Braga et al. (2014a) who use data on students who are randomly assigned to different professors at Bocconi University, Italy to investigate the relationship between research and teaching. They find that professors who are more productive in research are likely to be less effective as teachers, when output is measured by the h-index. The effect is reversed using yearly citations, however it is insignificant. 4 HighereducationintheNetherlandsandRandom- ization of students 4.1 Higher education in the Netherlands The system of higher education in the Netherlands is characterized by self-governance, autonomy of the universities and the unity of research and teaching. In Dutch universities, 3 the share of time spent on research and teaching is usually fixed by the contract. The data we use in this study come from the School of Business and Economics (SBE) of Maastricht University (UM), one of the biggest higher education institutions in the 3 Based on self-reported information from academic personnel at Dutch universities (n=4243), it follows that the share of working time spent on conducting research for PhD candidates is above 70 percent, for postdoctoral researchers is above 50 percent, for assistant professors and associate professors is between 20 and 25 percent, and for professors is below 20 percent. The rest of the contract time is usually spent on teaching and organizational tasks (de Goede and Hessels (2014)). 8country with over 15 000 students. There are around 4200 students enrolled in one of the programs at SBE with a high percentage of international students (around 40%). The vast majority of the bachelor programs last 3 years in contrast to bachelor programs at, for example, the U.S. Universities which last 4 years. Most of the students continue their studies with a master program which lasts only one year. The teaching strategy at the UM provides a unique opportunity to investigate the effect of research quality on teaching. Students follow weekly or every two weeks lectures in both bachelor and master programs as in most educational institutions, generally taught by the senior staff at the departments, which are called course coordinators. Later in that week students participate in tutorials supervised by other teachers. All tutorials make use of Problem- Based Learning approach, which is an important component of the teaching philosophy of the UM (Bastiaens and Nijhuis (2012)). This approach emphasizes on personal skill development, including problem solving, group work and self-directed learning. Each tutorial can have at most 16 students, which means that each course at SBE can have several tutorials taught by different teachers. At the end of the course, the students who are taking the same course, also take the same exam even though they participate in tutorials taught by different teachers. The exam is generally prepared by the course coordinator, and it is for almost all courses in the form of a written exam. In order to ensure objectivity, the grading is done collectively by the tutorial teachers. General practice is that each teacher grades a part of the exams from all students instead of grading only the exams of the students in their tutorials. Before taking the final exam students fill in online evaluation surveys to indicate their opinions about course and tutorial teachers as well as the completed course. The teachers receive the evaluation scores after the final grades are published online. The final grades are given in a scale of 1-10. The passing grade is 5.5. 4.2 Randomization of students and teachers Allocation of students into tutorial groups is done by the Scheduling Department at SBE via a computer program. Before the start of the academic year students register for the courses that they want to follow. In bachelor programs most of the courses are 9mandatory for students, whereas in masters programs students can choose among a large variety of courses. Once the online registration closes, all students taking the same course are randomly assigned to tutorial groups by a computer program. Afterwards, tutorial 4 teachers are randomly assigned to tutorial groups within a course. Finally, the list of students in each tutorial group and the corresponding teachers is published by the Scheduling Department. Even though they are assigned to different tutors, all students taking the same course take exactly the same exam at the end of the course. Feld and Zulitz (2016) present more detailed information about the procedure used by SBE and perform several estimations to check the random assignment of students into tutorial groups. The authors show that randomization of students works successfully. 5 Data We received a data set for more than 9000 students in BA and MA programs at UM in the years 2011, 2012 and 2013. This data set includes information on student grades, courses, programs at which the students were participating and several background characteristics such as age and nationality. In total, this data set has 80 000 student-course-grade observations. For students who filled in course evaluation forms we also received information on tutorial groups and teachers. However, not every student fills in the evaluation forms, and for those who do not fill in evaluation forms there is no tutorial and teacher information. This means we can only use information about students who fill in evaluation forms, which decreases the number of observations from 80 000 to 28 000. The first panel in Table 2 shows the descriptive statistics for student characteristics for the whole sample and for those who filled in evaluation surveys. The last column presents the p-values for mean differences between the two groups. It shows that the difference between the two samples is significant for many characteristics. Overall students with higher grades, female students and older students are more likely to fill in the evaluation surveys. The 4 It is expected that certain teachers are assigned to certain courses based on their expertise. How- ever, they are randomly assigned to tutorial groups within one course which does not invalidate our randomization. 10difference between average grade for students who fill in the evaluation forms and those who do not is 0.4. Even though better students are more likely to fill the evaluation forms, 5 the difference is not very large considering the grades are given in a scale from 1 to 10. The publication records are also obtained from SBE. For teachers who have worked at UM the entire observed period (2008-2011), we obtain information on publication record. These records show how many publications that a teacher had in A, B or C 6 level journals in a certain year. Since our measure of research quality entirely depends on such publication records we make certain choices with regards to the measurement. First, instead of using the publication records at each year separately we calculate the total number of publications in the last 4 years for each year so that our measurements would suffer less from possible outliers. This means, for example, for a teacher who was teaching in 2011 we use information on the publication records from 2008 to 2011. SBE does not keep track of the publication records of teachers who did not work at UM for the entire period or that of PhD students. Therefore, we can use only a subset of the initial student data set. This subset consists of 5934 student-course-grade observations. In total there are 176 different courses, which gives 408 course-year combinations. 69 of these course-year combinations have multiple tutorial groups taught by different teachers. There are 1127 tutorial groups taught by 83 different teachers. The second panel in Table 2 shows the descriptive statistics for students who filled in evaluation surveys but excluded due to limited information on publications and for students who are included in the final analytical sample of 5934 observations. Although several characteristics are significantly different, student grades do not differ significantly between both groups. Later in Section 7 7, we discuss these selections more in detail. Table 3 shows the distribution of teachers according to the number of publications. 5 That being sad our results still need to be interpreted with caution as we obtain results for slightly better and maybe more motivated students in general. 6 The list of all journals and corresponding classifications used by SBE are given in Table 16 in Appendix 9. SBE’s main strategy in deciding on journal classification is to use 5-year impact factor of the (S)SCI listed journals. 7 Note that we do not explicitly deal with students who drop out. We can not directly observe if a student drops the course once s/he learns in which tutorial group s/he sits. In order to have an idea about such cases we assume that a student can be classified as a drop out if that student registers to a course but do not make the exam at the end. In the initial sample of 80000 observations, only 7% of the students register to a course but do not make the exam. This number is less than 1% for the analytical sample of 5934 observations. 11There are 83 teachers in total. 35 of them had at least one A publication, 59 had at least one B and 60 had at least one C publications in the last 4 years. 15 teachers had one A publications, 8 teachers had two A publications, and so on. Table 4 shows the interactions between different publications. No teacher had only A publications, 4 teacher had only B publications and 6 teachers had only C publications. Finally, 30 teachers had A, B and C publications in the last 4 years. In our empirical analyses we make a distinction between bachelor and master students as the research quality of teachers might have heterogeneous effects on students due to the differences in course types (mandatory vs. selective courses; general topics vs. specialized topics), student motivation, etc. Tables 5 and 6 present the descriptive statistics of student and teacher variables used in the empirical analysis by differentiating first year bachelor 8 students, second and third year bachelor students and master students. Descriptions of these variables are given in Table 7. Student grades are on average 6.6 for the first year bachelors, 7.2 for the second and third year bachelors and 7.3 for master students. There 9 is almost no professor teaching in the first year bachelor courses. 5.1 Tests for sample selection and randomization of students to teachers As noted earlier we have a significant selection in our data because of selection of stu- dents and restrictions due to teacher information. In order to investigate the selection of students we perform two descriptive analysis. In the first analysis we regress the proba- bility of filling in evaluation surveys on student characteristics for all of the students and then separately for bachelor and master students. Table 8 presents the results. In all columns the results show that students characteristics are significant in probability to fill in evaluation surveys. In the second analysis we regress the probability of being in the 8 Relatively high number of observation for master program is due to the fact that we use information on teachers (non-PhD students) who worked at UM for the entire period of 2008-2013 9 Note that there are two different Professor positions. The difference between the two is that the first group has more management responsibilities, has better publication records and rewarded with a better salary. 1210 analytical sample for those who filled in evaluation forms. The results are presented in Table 9. None of the student characteristics are found to be statistically significant. Therefore, the observations that we lose due to the restrictions on teacher information are not systematically different than those included in our analytical sample. Since randomization of students into teacher groups is the underlying identifying mech- anism, we perform a randomization check. In order to see if the randomization of students to different teachers successfully works, we regress teacher specific publication variables on student characteristics. Table 10 presents the results. In the first column we regress the probability of having any publication in the last 4 years on student characteristics. In the following three columns we regress the probability of having any publications in A, B or C level journals on student characteristics. None of the student characteristics is significant. Therefore we conclude that, in terms of publication performance of teachers, randomization works successfully. 6 Model We investigate the effect of research quality of teachers on teaching quality measured by student evaluations of the teacher and student grades. For student grades we have individual data, and we use the following regression: G =β +β P +β T +β S +β C +u (1) ictg 0 1 tg 2 tg 3 it 4 ct ictg whereG is the grade of student i, in course c and year t and at the tutorial taught ictg by teacher g. P is the publication record of teacher g, in year t. Similarly, T is the tg tg set of other teacher characteristics. S is the student characteristics in year t, which it also includes a program fixed effect for the program that students are enrolled in. C ct 11 is the course fixed effect for course c in year t. u is the error term. Since students ictg 10 Note that only a part of the observations is used in the analytical sample due to unavailability of data for certain teachers. 11 Since the courses are taught by senior members of the department, course coordinators, and tutorials are taught by different teachers, course fixed effects also capture course coordinator effects. This is important because course coordinators can be different when it comes to how rigorous they are about exam questions, course structure or tutorial guidelines. By controlling for such course fixed effects, we 13are randomly assigned to different teachers after they make a course choice and take the same exam at the end of the course, we can interpret the coefficients of P and T as tg tg causal, conditional on C . In all estimations we use robust standard errors clustered on ct the course-year level because of a possible correlation between the outcomes of students choosing the same course. For student evaluations of the teachers, we have data only on the teacher level unlike the data on student grades. In other words we know the average evaluation score that a teacher receives after the course ends. Therefore, we cannot perform the same individual level analysis as in the student grades analysis. In order to investigate the evaluation scores we use averages of the all variables on tutorial (teacher) level. The regression equation is E =β +β P +β T +β S +β C +v (2) ctg 0 1 tg 2 tg 3 tg 4 ct ctg whereE is the average teacher evaluation score in course c, year t and at the tutorial ctg taught by teacher g. P is the publication record of teacher g,T is the set of other teacher tg tg characteristics in year t. S is the average of student characteristics in tutorial g in year t t. C is again the set of fixed effects for course c in year t. v is the error term. ct ctg The course structure in the bachelor and master programs are different. Courses in bachelor programs tend to be general introduction courses on various topics. Courses in master programs, on the other hand, are mostly specialized courses. When a teacher gives a course on the specific topic that s/he specializes in, we expect the expertise and motivation to be different. Therefore, in our estimations we run the above-mentioned model first for all students, and then for bachelor and master students separately. 7 Results First, we present the results of individual level student grade estimations. The results are displayed in Table 11. The first column displays the results for all students, the rest achieve identification through within variation- variation due to the different teachers in different tutorials for the same course. 14of the columns present the results for first year bachelor students, second and third year bachelor students and finally for master students, respectively. In these analyses, research quality is a dummy variable which is 1 if the teacher had any publications in the last 4 years. Therefore, we measure the effect of having any publication activity regardless of the quality of publication. For all specifications, there is a small positive but insignificant effect on student grades. Table 12 presents the results ofl other analyses using different publication variables to measure research quality. In row one, the coefficient estimate for total number of publications in the last 4 years shows that the number of publications has no effect on student performance. In this estimation we measure the effect of total publication activity ignoring the quality of publications. In row 2, the research quality variable is a dummy variable which is 1 if the teacher had any A level publications in the last 4 years. Hence, in row 2 we measure the effect of having a teacher who conducts high quality research. The coefficient estimate for master students shows that there is a significant positive effect on student grades. Having a teacher with at least one A level publication in the last four years in associated with a 0.4 higher student grade. This suggests that in master programs students taught by teachers with high quality of publications perform better whereas students of teachers with more publications do not. Thus, quality seems to be more important than quantity. In row 3 the research quality variable is a dummy variable which is 1 if the teacher had any B level publications in the last 4 years. We find smaller insignificant positive effects. Comparing these results with the one in Table 11 shows that as the research quality of the teacher increases, student performance increases, only for master students. In row 4 we estimate the effect of total number of A publications on student grades. Again, for master students we find a positive significant effect. Having one more A level publication increases the student grades by 0.2 on average. Therefore, quantity of publications only matter for A publications. Our baseline results show that mechanisms suggesting a positive relationship between research and teaching, such as skill transfers and teacher-student interactions, dominate the ones suggesting a negative relationship such as time and effort allocation. We be- lieve that the discrepancy between the results for bachelor and master students further 15strengthens this interpretation. Finding a stronger effect for master students is not partic- ular to our analysis (see Arnold (2006)) and can be explained by the course characteristics in the bachelor and master programs. Most of the courses in the bachelor programs are mandatory courses on introductory level. However, master courses can be elective ones, more specialized on certain topics and followed by students who are more interested and motivated. It is also generally the case that teachers give special topic courses which primarily focus on their field of interest. This can increase the effects of skill transfers and the effects of interactions between teachers and students. In Table 13 we present the results of some sensitivity analyses where we introduce more covariates using student level and tutor level information. For these sensitivity analysis 12 we use the specification in the second row of Table 12. In panel 1, we introduce tutor- student gender combinations. The coefficient estimates show that male students taught by female teachers perform worse in comparison to male students who are taught by male teachers in the master program. In panel 2, we add peer variables by calculating the average age and percentage of females in the classroom for peers of students. We perform this analysis because peer effects can be important in the classroom. The coefficient estimate of the publication variable remains the same. Finally, in panel 3 we add variables to capture the academic position of the teachers. The reference group is lecturers. The correlation between publications and positions is very high. This is of course expected as the decision to promote an assistant professor to associate professor position, for example, mainly depends on the publication records of the academic. Once we control for the position variable, the variation in the publication variable becomes very small. This is 1314 reflected in the higher standard error for the publication variable. In Table 14 we present the results of teacher evaluations. The coefficient estimate for having any publication shows that teachers with publications receive lower evaluation 12 Although we perform the sensitivity analysis for all of the other specifications, we present only the results of the third specification due to high significant effect of research quality measure. All other results remain the same once we introduce more variables, and they are available upon request. 13 We choose to include the variables on academic position only in a sensitivity analysis because of correlation between publication records and academic positions. This correlation is not surprising as decisions on promotions/tenure largely depend on publishing performance. 14 When it comes to the question of whether we can control for experience of the tutor, we can do that partly by including the age of the tutor. In all our estimations we control for the age. Therefore, we believe that we partly control for the work experience. 1615 scores on average although the coefficients are estimated imprecisely. Table 15 presents the results for other publication measures. The results for master students show that coefficient estimates are positive but small, indicating that teachers with high quality publications do not receive higher evaluation scores. The teachers with publications on the other hand receive lower scores in the second and third year of the bachelor programs. The difference between the results concerning students grades and student evaluations is important. As mentioned earlier when students evaluate the teachers, they do not nec- essarily evaluate the teaching effectiveness. Evaluation scores might reflect the personality of the teacher or in general personal experience in the classroom. A rigorous demanding teacher for example might end up with a lower score compared to a fun but not much better teacher. This can explain the smaller or even the negative results for evaluation scores estimations. This is in line with some of the previous findings in the literature (see Emery et al. (2003)). Student grades, on the other hand, might reflect true learning experience and can be more informative in measuring teaching effectiveness. 8 Conclusion There is a continuous debate about the relationship between research quality of academi- cians and their teaching performance. Are good researchers also good teachers? Answer- ing this question is important not only for scientific merit but also for policy-making, especially for higher education stakeholders as the answer can help them in distributing human resources more efficiently between research and teaching. In this paper we investigate the relationship between research quality and teaching quality. We use data from Maastricht University, the Netherlands, where students are randomly allocated to different teachers even though they all take the same exam. The research quality is measured by the publication records of the teachers. The teaching quality is measured by both student grades and student evaluations of the teachers. Ex- ploiting the random allocation of students to different teachers, and the fact that students with different teachers make the same exam, we find that master students who are taught 15 The high positive coefficient estimate for the first year bachelor students is most probably due to the low number of observations. 17by teachers with high quality publications score higher grades. However, we do not find any effect for having any publications or total number of publications. This shows that quality matters when it comes to student performances, and the quantity matters only if the quality is good because only for A publications the number of publications has a significant positive effect on student grades. Moreover, we believe that the stronger results for master students strengthen our interpretation of the findings. The vast major- ity of the courses in the bachelor programs are mandatory courses on introductory level. However, master courses can be elective ones, they are much more specialized on certain topics, generally in the interest areas of teachers, and followed by students who are more interested and motivated. This can increase the effects of aforementioned skill transfers and the interactions between teachers and students in the classrooms. The results based on course evaluations show that the findings from student grades estimations are not fully reflected in how students evaluate their teachers. Master stu- dents do not give higher scores to teachers with higher number of publications or higher quality of publications. Moreover, bachelor students give lower scores to teachers with publications. The difference between the results of student grades and student evaluation scores estimations indicates that the two measures capture different things. Evaluation scores might reflect the personality of the teacher or in general personal experience in the classroom rather than learning. Hence, we conclude that it is useful to use both measures in analyzing teaching quality. When it comes to the policy implications of our findings, one should interpret the results with caution. Our findings cannot be interpreted as evidence supporting or dis- missing the argument that research and teaching at the universities should be separated. Our results also do not answer how much time the teachers should spend on teaching or research. We conclude that excellent research performance contributes to a higher teaching quality in the master programs if the quality of teaching is measured by student grades. This might suggest that if good researchers have indeed time for teaching, then they better should be allocated to courses in the master programs. 18