How can a quality Research improve the Educational standard
The relationship between research quality and teaching quality. and how can a quality research improve educational standard
CPB Discussion Paper 347
Are good researchers
also good teachers?
Roel van Elk
Iryna Rud1 Introduction
There is a continuous discussion, both academic and public, on how research and educa-
tion in universities are related (Hattie and Marsh (1996); uz Zaman (2004); Jenkins et al.
(2007); Elken and Wollscheid (2016)). It is often questioned whether a ‘good researcher’
implies also a ‘good teacher’, whether teachers who conduct research are more eﬀective
than those teachers who do not do research, and, in general, whether research and teach-
ing activities can complement each other. Even though answers to these questions are
important for university stakeholders to ﬁnd the most eﬃcient way in distributing human
resources between research and teaching activities, the current empirical evidence is lim-
ited and mixed. Furthermore, evidence on the relationship between research and teaching
is important for providing a better insight into the eﬀective production of research output
and student learning outcomes.
This study aims to analyse the relationship between research quality and teaching
quality. We use individual-level data from the School of Business and Economics at
Maastricht University in the Netherlands, where students are randomly allocated to dif-
ferent teachers within the same course. At the end of the course, students take the same
exam. This enables us to exploit the exogenous variation in research quality of teachers
on student outcomes such as grades and student evaluation scores. We measure research
quality by the publication records of the teachers. Our results show that master students
who are taught by teachers with high quality publications score higher grades. We do not
ﬁnd any eﬀect for having any publications or total number of publications. Therefore,
quality seems to matter rather than the quantity. The results on student grades are not
fully reﬂected in how students evaluate their teachers. Master students do not give higher
scores to teachers with higher number of publications or higher quality of publications.
Bachelor students give lower scores.
Empirical evidence on the relationship between research quality and teaching quality
obtained from using data on randomized experiments is scarce. Our study contributes
to the existing literature by using rich individual-level data on students who are ran-
domly assigned to diﬀerent teachers within the same course. The second contribution of
2this study lies in exploring and comparing two diﬀerent measures of teaching outcomes:
student evaluations of the teachers and ﬁnal student grades.
This paper is organized as follows: In section 2, we discuss possible mechanisms under-
lying the relationship between research and education. In section 3, we discuss previous
literature on the relationship between research quality and teaching quality. In section
4 we provide an overview of the higher education system in the Netherlands and of the
Maastricht University, in particular. Section 5 describes data and presents descriptive
statistics. In section 6, we describe our empirical strategy. In section 7, we discuss our
estimation results. Finally, section 8 concludes.
2 Mechanisms linking research quality and teaching
The link between research quality and teaching quality is complex and multidimensional.
Based on the previous literature, we distinguish several main mechanisms that can underlie
this relationship. Depending on which of the mechanisms dominates, this relationship can
vary from positive to null, and even to a negative one.
The ﬁrst type of mechanisms suggests a positive relationship between research and
teaching via complementarity between skills (uz Zaman (2004)). Conducting research
can both enhance teacher’s proﬁciency in the subject and keep the teacher up-to-date
with regards to the newest developments in the discipline. As a result, research activities
would have a positive impact on teaching quality. Such skills transfer can operate not only
at the level of teacher but also at the teacher-student level. For example, through involve-
ment in teaching activities and interactions with students during classroom discussions,
researchers can transfer their critical thinking and research skills to students.
The second set of mechanisms suggests a negative relationship between research and
teaching. Both research and teaching activities require investment of time and eﬀort.
Being involved in one activity, for instance, the process of conducting research, usually
does not allow for simultaneously spending time and eﬀort on another activity (the pro-
cess of teaching), unless one activity beneﬁts both research and teaching (e.g. reading
3a scientiﬁc paper can simultaneously contribute to research ideas and to teaching prepa-
ration). Time and eﬀort allocated to teaching and research are also inﬂuenced by the
system of incentives in academia. Research can be rewarded by universities through pro-
motion more generously than teaching. Therefore, there can be a selective inﬂow into the
profession, or people in academia might choose to prioritize research over teaching, and
they might be more likely to build career in academia by doing research, while teaching
is often regarded as “punishment” (Walstad and Allgood (2005); Cretchley et al. (2014);
De Philippis (2015)). Furthermore, contrary to the ﬁrst mechanism teaching and research
might require diﬀerent set of skills. If research requires more speciﬁc skills (e.g. synthesis,
deduction) than teaching (e.g. communication, mentoring), this can lead to disparities
between skill transfers. Hence, the relationship between research output and teaching ef-
fectiveness might be neutral or even negative. Which one of these mechanisms dominates
the others is an empirical question.
3 Previous literature
The literature on the relationship between research quality and teaching quality is rather
large. However, most of these studies focus on the correlation between the two without
making causal claims. Below we give an extensive review of the literature by ﬁrst focusing
on the research and teaching quality measures and then focusing on the ﬁndings between
these two measurements. Since the liteture is extensive, we present some of the relevant
papers in Table 1.
3.1 Measures of research output and teaching quality from the
It is generally not straightforward how to measure research quality and teaching qual-
ity. Even if the research quality can be to some extent observed and summarized based
Studies are not included to this table if they are (a) published earlier than 1980; (b) descriptive (non-
empirical); (c) analyze exclusively the link from teaching to research output; (d) based on the analyzes of
teachers(students) believes(views) on the research-teaching relationship; (e) the number of observations
in the analyzes is lower than 20.
4on (produced) research outcomes, teaching quality cannot be directly observed. In our
study we measure the research quality by publication records of the teachers. We have
information on not only how many publications that a teacher had in a certain year,
but also if these publications appeared in A, B or C level journals. Therefore, we can
diﬀerentiate between quantity of publications and quality of publications. For teaching
quality we use two diﬀerent measures: student evaluation of the teachers and student
grades. Whereas student evaluations capture how students perceive teaching quality of
their teachers, student grades capture the ﬁnal learning experience.
We are aware of the fact that there are several diﬀerent measures prosposed in the
literature. Research quality has been traditionally measured by the number of published
and refereed articles (see e.g. Gottlieb and Keith (1997)), by citation scores (see e.g.
Rothman and Preshaw (1975)), by impact factor (Saha et al. (2003)) and by the combined
measures of quantity and quality of publications (Lanjouw and Schankerman (2004);
Hirsch (2005); Bornmann et al. (2008) develop an h-index to characterize the scientiﬁc
output of researchers, based on both the number of published articles and the impact of
these publications). However all of these measures have been criticized and there is no
consensus in the literature on which measure of research quality should be considered as
universal, but rather the choice of the measure depends on the particular research goals
of the study. In our study, we will use publication records to construct diﬀerent research
quality measures that vary in emphasis on quantity and quality of publications.
Teaching quality is related to both teacher’s performance and student learning out-
comes. Student evaluations of teachers are the most frequently used measure to estimate
teaching quality in higher education (Becker and Watts (1999); Becker et al. (2012)).
This popularity is mainly explained by the availability of the data on student evalua-
tions. This measure, however, has been increasingly challenged in the literature since it
is based on the perceptions of respondents and it might not necessarily reﬂect true teach-
Even though it is much more complex than the other measures, h-index also has been criticized in
later studies for non-completeness. Several researchers, for example, argued that this index does not
account for aging of citation Sidiropoulos et al. (2007); Glanzel (2006); Burrell (2007). At the same time,
alternatives to the h-index, such as a g-index (Egghe (2006)), an a-index (Jin (2006)) and an ar-index
(Jin (2007)), hardly overcome all drawbacks of the h-index (Bornmann and Daniel (2007); Bornmann
et al. (2008)).
5ing eﬀectiveness. In particular, students often evaluate the teachers on the basis of how
they enjoyed the course and on the basis of teachers’ personal characteristics, and not
necessarily characteristics related to teaching quality (Braga et al. (2014b); McPherson
et al. (2009)). Evidence shows that student evaluations are less biased in the populations
where high-skill students are over-represented (Braga et al. (2014a)). Another concern
is that students are usually not obliged by the institution to provide evaluations, and
they are not randomly selected, which leads to a biased assessment of teacher quality. In
other words, students who eventually ﬁll in evaluation forms represent a selected sample
of all students. Salomons and Goos (2014) quantify the direction and size of selection
on both observable and unobservable characteristics of students, teachers and courses.
They ﬁnd that the true evaluation score is lower than the average reported, and thus the
selection bias is positive. Moreover, they also conclude that taking student evaluation is
not advisable when response rates are low or vary considerably across courses. Hoﬀmann
and Oreopoulos (2009) suggest using the mean of the averages for teacher evaluations
across classes, as this ensures that teacher quality measures diﬀer only when instructors
diﬀer. Finally, Emery et al. (2003) and Becker and Watts (1999) advise not to use student
evaluations as the only measure of teaching quality.
The alternative to student evaluation of teachers is actual student grades which are
directly informative about student learning. However, using student grades in empirical
analyses is not without problems either. Some studies point at the fact that teachers can
inﬂate grades for the purpose of elevating student evaluations (see e.g. Krautmann and
Sander (1999); Johnson (2003); Carrell and West (2010)). This tendency is related to
characteristics of departments and teacher-speciﬁc characteristics (Jewell et al. (2013)),
whereas teacher-speciﬁc characteristics explain relatively much more variation in grade
inﬂation. Jewell et al. (2013) explain this by the universal tendency of the universities to
use student evaluation scores as inputs into tenure and promotion decisions, and therefore
teachers are likely to inﬂate grades rationally. This causes many universities to collect
student evaluations before ﬁnal exams. Another criticism about using student grades is
that student performance can be inﬂuenced by diﬀerent characteristics of students, not
related to teaching eﬀectiveness (Berk (1988, 2014)). Keeping these criticisms in mind,
6we use both student evaluations and student grades as teaching quality measures. By
doing so, we also shed more light on the diﬀerences between the two measures.
ity and teaching quality
The existing literature on the relationship between research and teaching is primarily lim-
ited to correlational studies. From an extensive review of empirical literature, uz Zaman
(2004) concludes that the correlation between research and teaching varies from -.4 to +.8.
This broad range of ﬁndings can be explained by diﬀerent measures of research quality
and teaching quality, by diﬀerences in applied empirical strategies and by a variety of
exogenous and endogenous factors inﬂuencing this relationship, such as discipline or the
ability level of students. More recent studies extend the previous correlation literature
by controlling for diﬀerent educational settings (e.g. discipline, institution type, student
group size, level of studies), characteristics of teachers (e.g. age, academic rank), and
characteristics of students (e.g. gender, ability of students) (see e.g. Zamorski (2002);
Bettinger and Long (2005); Arnold (2006); Cherastidtham et al. (2013)). Nevertheless,
evidence obtained from these studies is mixed.
The relationship between research quality and teaching quality can diﬀer across coun-
tries and educational systems. Whereas the vast majority of previous research on the
relationship between research quality and teaching quality has been conducted for the
United States, there is recently a growing empirical evidence on this relationship from
other countries, such as Korea, Italy, the Netherlands, and Australia (Cherastidtham
et al. (2013); Braga et al. (2014a); Arnold (2006); Bak et al. (2015)). For the Nether-
lands, Arnold (2006) examines the relationship between research quality and teaching
quality at the Faculty of Economics, at Erasmus University of Rotterdam. He creates
a measure of research quality based on the information whether academic staﬀ meets
the criteria for a research fellowship of the graduate school and research institute. He
measures teaching quality by student evaluations of teachers. The study ﬁnds a negative
correlation between research quality and teaching quality for the ﬁrst and second year
7bachelor courses, while this relationship is positive for the third year bachelor courses and
for the master courses.
Despite controlling for diﬀerent observable factors inﬂuencing research and teaching
and accounting for potential non-linearity of the relationship, most of studies on the
relationship between research and teaching suﬀer from endogeneity problems, in particular
due to selection issues (i.e. self-selection of teachers to research and teaching activities
and self-selection of students to diﬀerent teachers). In the recent years, more data on
random assignment of students to diﬀerent teachers in higher education have become
available and enabled researchers to analyze diﬀerent aspects in higher education (see
Carrell and West (2010); Braga et al. (2014b,a); Feld and Zulitz (2016)). However, causal
research on the relationship between research quality and teaching quality is still scarce.
The only exception is a study by Braga et al. (2014a) who use data on students who
are randomly assigned to diﬀerent professors at Bocconi University, Italy to investigate
the relationship between research and teaching. They ﬁnd that professors who are more
productive in research are likely to be less eﬀective as teachers, when output is measured
by the h-index. The eﬀect is reversed using yearly citations, however it is insigniﬁcant.
ization of students
4.1 Higher education in the Netherlands
The system of higher education in the Netherlands is characterized by self-governance,
autonomy of the universities and the unity of research and teaching. In Dutch universities,
the share of time spent on research and teaching is usually ﬁxed by the contract.
The data we use in this study come from the School of Business and Economics (SBE)
of Maastricht University (UM), one of the biggest higher education institutions in the
Based on self-reported information from academic personnel at Dutch universities (n=4243), it follows
that the share of working time spent on conducting research for PhD candidates is above 70 percent, for
postdoctoral researchers is above 50 percent, for assistant professors and associate professors is between
20 and 25 percent, and for professors is below 20 percent. The rest of the contract time is usually spent
on teaching and organizational tasks (de Goede and Hessels (2014)).
8country with over 15 000 students. There are around 4200 students enrolled in one of
the programs at SBE with a high percentage of international students (around 40%).
The vast majority of the bachelor programs last 3 years in contrast to bachelor programs
at, for example, the U.S. Universities which last 4 years. Most of the students continue
their studies with a master program which lasts only one year. The teaching strategy
at the UM provides a unique opportunity to investigate the eﬀect of research quality
on teaching. Students follow weekly or every two weeks lectures in both bachelor and
master programs as in most educational institutions, generally taught by the senior staﬀ
at the departments, which are called course coordinators. Later in that week students
participate in tutorials supervised by other teachers. All tutorials make use of Problem-
Based Learning approach, which is an important component of the teaching philosophy
of the UM (Bastiaens and Nijhuis (2012)). This approach emphasizes on personal skill
development, including problem solving, group work and self-directed learning. Each
tutorial can have at most 16 students, which means that each course at SBE can have
several tutorials taught by diﬀerent teachers.
At the end of the course, the students who are taking the same course, also take the
same exam even though they participate in tutorials taught by diﬀerent teachers. The
exam is generally prepared by the course coordinator, and it is for almost all courses in
the form of a written exam. In order to ensure objectivity, the grading is done collectively
by the tutorial teachers. General practice is that each teacher grades a part of the exams
from all students instead of grading only the exams of the students in their tutorials.
Before taking the ﬁnal exam students ﬁll in online evaluation surveys to indicate their
opinions about course and tutorial teachers as well as the completed course. The teachers
receive the evaluation scores after the ﬁnal grades are published online. The ﬁnal grades
are given in a scale of 1-10. The passing grade is 5.5.
4.2 Randomization of students and teachers
Allocation of students into tutorial groups is done by the Scheduling Department at
SBE via a computer program. Before the start of the academic year students register
for the courses that they want to follow. In bachelor programs most of the courses are
9mandatory for students, whereas in masters programs students can choose among a large
variety of courses. Once the online registration closes, all students taking the same course
are randomly assigned to tutorial groups by a computer program. Afterwards, tutorial
teachers are randomly assigned to tutorial groups within a course. Finally, the list
of students in each tutorial group and the corresponding teachers is published by the
Scheduling Department. Even though they are assigned to diﬀerent tutors, all students
taking the same course take exactly the same exam at the end of the course. Feld and
Zulitz (2016) present more detailed information about the procedure used by SBE and
perform several estimations to check the random assignment of students into tutorial
groups. The authors show that randomization of students works successfully.
We received a data set for more than 9000 students in BA and MA programs at UM in the
years 2011, 2012 and 2013. This data set includes information on student grades, courses,
programs at which the students were participating and several background characteristics
such as age and nationality. In total, this data set has 80 000 student-course-grade
For students who ﬁlled in course evaluation forms we also received information on
tutorial groups and teachers. However, not every student ﬁlls in the evaluation forms, and
for those who do not ﬁll in evaluation forms there is no tutorial and teacher information.
This means we can only use information about students who ﬁll in evaluation forms,
which decreases the number of observations from 80 000 to 28 000. The ﬁrst panel in
Table 2 shows the descriptive statistics for student characteristics for the whole sample
and for those who ﬁlled in evaluation surveys. The last column presents the p-values
for mean diﬀerences between the two groups. It shows that the diﬀerence between the
two samples is signiﬁcant for many characteristics. Overall students with higher grades,
female students and older students are more likely to ﬁll in the evaluation surveys. The
It is expected that certain teachers are assigned to certain courses based on their expertise. How-
ever, they are randomly assigned to tutorial groups within one course which does not invalidate our
10diﬀerence between average grade for students who ﬁll in the evaluation forms and those
who do not is 0.4. Even though better students are more likely to ﬁll the evaluation forms,
the diﬀerence is not very large considering the grades are given in a scale from 1 to 10.
The publication records are also obtained from SBE. For teachers who have worked
at UM the entire observed period (2008-2011), we obtain information on publication
record. These records show how many publications that a teacher had in A, B or C
level journals in a certain year. Since our measure of research quality entirely depends
on such publication records we make certain choices with regards to the measurement.
First, instead of using the publication records at each year separately we calculate the
total number of publications in the last 4 years for each year so that our measurements
would suﬀer less from possible outliers. This means, for example, for a teacher who was
teaching in 2011 we use information on the publication records from 2008 to 2011. SBE
does not keep track of the publication records of teachers who did not work at UM for
the entire period or that of PhD students. Therefore, we can use only a subset of the
initial student data set. This subset consists of 5934 student-course-grade observations.
In total there are 176 diﬀerent courses, which gives 408 course-year combinations. 69 of
these course-year combinations have multiple tutorial groups taught by diﬀerent teachers.
There are 1127 tutorial groups taught by 83 diﬀerent teachers. The second panel in Table
2 shows the descriptive statistics for students who ﬁlled in evaluation surveys but excluded
due to limited information on publications and for students who are included in the ﬁnal
analytical sample of 5934 observations. Although several characteristics are signiﬁcantly
diﬀerent, student grades do not diﬀer signiﬁcantly between both groups. Later in Section
7, we discuss these selections more in detail.
Table 3 shows the distribution of teachers according to the number of publications.
That being sad our results still need to be interpreted with caution as we obtain results for slightly
better and maybe more motivated students in general.
The list of all journals and corresponding classiﬁcations used by SBE are given in Table 16 in Appendix
9. SBE’s main strategy in deciding on journal classiﬁcation is to use 5-year impact factor of the (S)SCI
Note that we do not explicitly deal with students who drop out. We can not directly observe if a
student drops the course once s/he learns in which tutorial group s/he sits. In order to have an idea
about such cases we assume that a student can be classiﬁed as a drop out if that student registers to a
course but do not make the exam at the end. In the initial sample of 80000 observations, only 7% of the
students register to a course but do not make the exam. This number is less than 1% for the analytical
sample of 5934 observations.
11There are 83 teachers in total. 35 of them had at least one A publication, 59 had at least
one B and 60 had at least one C publications in the last 4 years. 15 teachers had one A
publications, 8 teachers had two A publications, and so on. Table 4 shows the interactions
between diﬀerent publications. No teacher had only A publications, 4 teacher had only B
publications and 6 teachers had only C publications. Finally, 30 teachers had A, B and
C publications in the last 4 years.
In our empirical analyses we make a distinction between bachelor and master students
as the research quality of teachers might have heterogeneous eﬀects on students due to the
diﬀerences in course types (mandatory vs. selective courses; general topics vs. specialized
topics), student motivation, etc. Tables 5 and 6 present the descriptive statistics of student
and teacher variables used in the empirical analysis by diﬀerentiating ﬁrst year bachelor
students, second and third year bachelor students and master students. Descriptions of
these variables are given in Table 7. Student grades are on average 6.6 for the ﬁrst year
bachelors, 7.2 for the second and third year bachelors and 7.3 for master students. There
is almost no professor teaching in the ﬁrst year bachelor courses.
5.1 Tests for sample selection and randomization of students to
As noted earlier we have a signiﬁcant selection in our data because of selection of stu-
dents and restrictions due to teacher information. In order to investigate the selection of
students we perform two descriptive analysis. In the ﬁrst analysis we regress the proba-
bility of ﬁlling in evaluation surveys on student characteristics for all of the students and
then separately for bachelor and master students. Table 8 presents the results. In all
columns the results show that students characteristics are signiﬁcant in probability to ﬁll
in evaluation surveys. In the second analysis we regress the probability of being in the
Relatively high number of observation for master program is due to the fact that we use information
on teachers (non-PhD students) who worked at UM for the entire period of 2008-2013
Note that there are two diﬀerent Professor positions. The diﬀerence between the two is that the ﬁrst
group has more management responsibilities, has better publication records and rewarded with a better
analytical sample for those who ﬁlled in evaluation forms. The results are presented
in Table 9. None of the student characteristics are found to be statistically signiﬁcant.
Therefore, the observations that we lose due to the restrictions on teacher information
are not systematically diﬀerent than those included in our analytical sample.
Since randomization of students into teacher groups is the underlying identifying mech-
anism, we perform a randomization check. In order to see if the randomization of students
to diﬀerent teachers successfully works, we regress teacher speciﬁc publication variables
on student characteristics. Table 10 presents the results. In the ﬁrst column we regress
the probability of having any publication in the last 4 years on student characteristics.
In the following three columns we regress the probability of having any publications in
A, B or C level journals on student characteristics. None of the student characteristics is
signiﬁcant. Therefore we conclude that, in terms of publication performance of teachers,
randomization works successfully.
We investigate the eﬀect of research quality of teachers on teaching quality measured
by student evaluations of the teacher and student grades. For student grades we have
individual data, and we use the following regression:
G =β +β P +β T +β S +β C +u (1)
ictg 0 1 tg 2 tg 3 it 4 ct ictg
whereG is the grade of student i, in course c and year t and at the tutorial taught
by teacher g. P is the publication record of teacher g, in year t. Similarly, T is the
set of other teacher characteristics. S is the student characteristics in year t, which
also includes a program ﬁxed eﬀect for the program that students are enrolled in. C
is the course ﬁxed eﬀect for course c in year t. u is the error term. Since students
Note that only a part of the observations is used in the analytical sample due to unavailability of
data for certain teachers.
Since the courses are taught by senior members of the department, course coordinators, and tutorials
are taught by diﬀerent teachers, course ﬁxed eﬀects also capture course coordinator eﬀects. This is
important because course coordinators can be diﬀerent when it comes to how rigorous they are about
exam questions, course structure or tutorial guidelines. By controlling for such course ﬁxed eﬀects, we
13are randomly assigned to diﬀerent teachers after they make a course choice and take the
same exam at the end of the course, we can interpret the coeﬃcients of P and T as
causal, conditional on C . In all estimations we use robust standard errors clustered on
the course-year level because of a possible correlation between the outcomes of students
choosing the same course.
For student evaluations of the teachers, we have data only on the teacher level unlike
the data on student grades. In other words we know the average evaluation score that a
teacher receives after the course ends. Therefore, we cannot perform the same individual
level analysis as in the student grades analysis. In order to investigate the evaluation
scores we use averages of the all variables on tutorial (teacher) level. The regression
E =β +β P +β T +β S +β C +v (2)
ctg 0 1 tg 2 tg 3 tg 4 ct ctg
whereE is the average teacher evaluation score in course c, year t and at the tutorial
taught by teacher g. P is the publication record of teacher g,T is the set of other teacher
characteristics in year t. S is the average of student characteristics in tutorial g in year
t. C is again the set of ﬁxed eﬀects for course c in year t. v is the error term.
The course structure in the bachelor and master programs are diﬀerent. Courses in
bachelor programs tend to be general introduction courses on various topics. Courses
in master programs, on the other hand, are mostly specialized courses. When a teacher
gives a course on the speciﬁc topic that s/he specializes in, we expect the expertise and
motivation to be diﬀerent. Therefore, in our estimations we run the above-mentioned
model ﬁrst for all students, and then for bachelor and master students separately.
First, we present the results of individual level student grade estimations. The results
are displayed in Table 11. The ﬁrst column displays the results for all students, the rest
achieve identiﬁcation through within variation- variation due to the diﬀerent teachers in diﬀerent tutorials
for the same course.
14of the columns present the results for ﬁrst year bachelor students, second and third year
bachelor students and ﬁnally for master students, respectively. In these analyses, research
quality is a dummy variable which is 1 if the teacher had any publications in the last 4
years. Therefore, we measure the eﬀect of having any publication activity regardless of
the quality of publication. For all speciﬁcations, there is a small positive but insigniﬁcant
eﬀect on student grades.
Table 12 presents the results oﬂ other analyses using diﬀerent publication variables
to measure research quality. In row one, the coeﬃcient estimate for total number of
publications in the last 4 years shows that the number of publications has no eﬀect on
student performance. In this estimation we measure the eﬀect of total publication activity
ignoring the quality of publications. In row 2, the research quality variable is a dummy
variable which is 1 if the teacher had any A level publications in the last 4 years. Hence, in
row 2 we measure the eﬀect of having a teacher who conducts high quality research. The
coeﬃcient estimate for master students shows that there is a signiﬁcant positive eﬀect
on student grades. Having a teacher with at least one A level publication in the last
four years in associated with a 0.4 higher student grade. This suggests that in master
programs students taught by teachers with high quality of publications perform better
whereas students of teachers with more publications do not. Thus, quality seems to be
more important than quantity. In row 3 the research quality variable is a dummy variable
which is 1 if the teacher had any B level publications in the last 4 years. We ﬁnd smaller
insigniﬁcant positive eﬀects. Comparing these results with the one in Table 11 shows that
as the research quality of the teacher increases, student performance increases, only for
master students. In row 4 we estimate the eﬀect of total number of A publications on
student grades. Again, for master students we ﬁnd a positive signiﬁcant eﬀect. Having
one more A level publication increases the student grades by 0.2 on average. Therefore,
quantity of publications only matter for A publications.
Our baseline results show that mechanisms suggesting a positive relationship between
research and teaching, such as skill transfers and teacher-student interactions, dominate
the ones suggesting a negative relationship such as time and eﬀort allocation. We be-
lieve that the discrepancy between the results for bachelor and master students further
15strengthens this interpretation. Finding a stronger eﬀect for master students is not partic-
ular to our analysis (see Arnold (2006)) and can be explained by the course characteristics
in the bachelor and master programs. Most of the courses in the bachelor programs are
mandatory courses on introductory level. However, master courses can be elective ones,
more specialized on certain topics and followed by students who are more interested and
motivated. It is also generally the case that teachers give special topic courses which
primarily focus on their ﬁeld of interest. This can increase the eﬀects of skill transfers
and the eﬀects of interactions between teachers and students.
In Table 13 we present the results of some sensitivity analyses where we introduce more
covariates using student level and tutor level information. For these sensitivity analysis
we use the speciﬁcation in the second row of Table 12. In panel 1, we introduce tutor-
student gender combinations. The coeﬃcient estimates show that male students taught
by female teachers perform worse in comparison to male students who are taught by male
teachers in the master program. In panel 2, we add peer variables by calculating the
average age and percentage of females in the classroom for peers of students. We perform
this analysis because peer eﬀects can be important in the classroom. The coeﬃcient
estimate of the publication variable remains the same. Finally, in panel 3 we add variables
to capture the academic position of the teachers. The reference group is lecturers. The
correlation between publications and positions is very high. This is of course expected as
the decision to promote an assistant professor to associate professor position, for example,
mainly depends on the publication records of the academic. Once we control for the
position variable, the variation in the publication variable becomes very small. This is
reﬂected in the higher standard error for the publication variable.
In Table 14 we present the results of teacher evaluations. The coeﬃcient estimate
for having any publication shows that teachers with publications receive lower evaluation
Although we perform the sensitivity analysis for all of the other speciﬁcations, we present only the
results of the third speciﬁcation due to high signiﬁcant eﬀect of research quality measure. All other
results remain the same once we introduce more variables, and they are available upon request.
We choose to include the variables on academic position only in a sensitivity analysis because of
correlation between publication records and academic positions. This correlation is not surprising as
decisions on promotions/tenure largely depend on publishing performance.
When it comes to the question of whether we can control for experience of the tutor, we can do that
partly by including the age of the tutor. In all our estimations we control for the age. Therefore, we
believe that we partly control for the work experience.
scores on average although the coeﬃcients are estimated imprecisely. Table 15 presents
the results for other publication measures. The results for master students show that
coeﬃcient estimates are positive but small, indicating that teachers with high quality
publications do not receive higher evaluation scores. The teachers with publications on
the other hand receive lower scores in the second and third year of the bachelor programs.
The diﬀerence between the results concerning students grades and student evaluations
is important. As mentioned earlier when students evaluate the teachers, they do not nec-
essarily evaluate the teaching eﬀectiveness. Evaluation scores might reﬂect the personality
of the teacher or in general personal experience in the classroom. A rigorous demanding
teacher for example might end up with a lower score compared to a fun but not much
better teacher. This can explain the smaller or even the negative results for evaluation
scores estimations. This is in line with some of the previous ﬁndings in the literature
(see Emery et al. (2003)). Student grades, on the other hand, might reﬂect true learning
experience and can be more informative in measuring teaching eﬀectiveness.
There is a continuous debate about the relationship between research quality of academi-
cians and their teaching performance. Are good researchers also good teachers? Answer-
ing this question is important not only for scientiﬁc merit but also for policy-making,
especially for higher education stakeholders as the answer can help them in distributing
human resources more eﬃciently between research and teaching.
In this paper we investigate the relationship between research quality and teaching
quality. We use data from Maastricht University, the Netherlands, where students are
randomly allocated to diﬀerent teachers even though they all take the same exam. The
research quality is measured by the publication records of the teachers. The teaching
quality is measured by both student grades and student evaluations of the teachers. Ex-
ploiting the random allocation of students to diﬀerent teachers, and the fact that students
with diﬀerent teachers make the same exam, we ﬁnd that master students who are taught
The high positive coeﬃcient estimate for the ﬁrst year bachelor students is most probably due to the
low number of observations.
17by teachers with high quality publications score higher grades. However, we do not ﬁnd
any eﬀect for having any publications or total number of publications. This shows that
quality matters when it comes to student performances, and the quantity matters only
if the quality is good because only for A publications the number of publications has
a signiﬁcant positive eﬀect on student grades. Moreover, we believe that the stronger
results for master students strengthen our interpretation of the ﬁndings. The vast major-
ity of the courses in the bachelor programs are mandatory courses on introductory level.
However, master courses can be elective ones, they are much more specialized on certain
topics, generally in the interest areas of teachers, and followed by students who are more
interested and motivated. This can increase the eﬀects of aforementioned skill transfers
and the interactions between teachers and students in the classrooms.
The results based on course evaluations show that the ﬁndings from student grades
estimations are not fully reﬂected in how students evaluate their teachers. Master stu-
dents do not give higher scores to teachers with higher number of publications or higher
quality of publications. Moreover, bachelor students give lower scores to teachers with
publications. The diﬀerence between the results of student grades and student evaluation
scores estimations indicates that the two measures capture diﬀerent things. Evaluation
scores might reﬂect the personality of the teacher or in general personal experience in the
classroom rather than learning. Hence, we conclude that it is useful to use both measures
in analyzing teaching quality.
When it comes to the policy implications of our ﬁndings, one should interpret the
results with caution. Our ﬁndings cannot be interpreted as evidence supporting or dis-
missing the argument that research and teaching at the universities should be separated.
Our results also do not answer how much time the teachers should spend on teaching
or research. We conclude that excellent research performance contributes to a higher
teaching quality in the master programs if the quality of teaching is measured by student
grades. This might suggest that if good researchers have indeed time for teaching, then
they better should be allocated to courses in the master programs.
Arnold, I. (2006). Het beste onderwijs komt uit de ivoren toren. Economisch Statistische
Bak, H.-J. et al. (2015). Too much emphasis on research? an empirical examination of the
relationship between research and teaching in multitasking environments. Research in Higher
Education 56(8), 843–860.
Bastiaens, E. and J. Nijhuis (2012). From problem-based learning to undergraduate research:
The experience of maastricht university in the netherlands. Council on Undergraduate Re-
search Quarterly 32, 43.
Becker, W. E., W. Bosshardt, and M. Watts (2012). Revisiting how departments of economics
evaluate teaching. Citeseer.
Becker, W. E. and M. Watts (1999). How departments of economics evaluate teaching. American
Economic Review, 344–349.
Berk, R. A. (1988). Fifty reasons why student achievement gain does not mean teacher eﬀec-
tiveness. Journal of Personnel Evaluation in Education 1 (4), 345–363.
Berk, R. A. (2014). Should student outcomes be used to evaluate teaching? The Journal of
Faculty Development 28 (2), 87.
Bettinger, E. and B. T. Long (2005). Help or hinder? adjunct professors and student outcomes.
Technical report, What’s happening to public higher education.
Bornmann, L. and H.-D. Daniel (2007). What do we know about the h index? Journal of the
American Society for Information Science and technology 58 (9), 1381–1385.
Bornmann, L., R. Mutz, and H.-D. Daniel (2008). Are there better indices for evaluation pur-
poses than the h index? a comparison of nine diﬀerent variants of the h index using data from
biomedicine. Journal of the American Society for Information Science and Technology 59 (5),
Braga, M., M. Paccagnella, and M. Pellizzari (2014a). The academic and labor market returns
of university professors. IZA Discussion Paper 7902, Institute for Labour Market Policy
Braga, M., M. Paccagnella, and M. Pellizzari (2014b). Evaluating students’ evaluations of
professors. Economics of Education Review 41, 71–88.
Brickley, J. and J. Zimmerman (2001). Changing incentives in a multitask environment: evidence
from a top-tier business school. Journal of Corporate Finance 7, 367–396.
Burrell, Q. (2007). Hirsch index or hirsch rate? some thoughts arising from liang’s data.
Scientometrics 73 (1), 19–28.
Cadez, S., V. Dimovski, and M. Zaman Groﬀ (2015). Research, teaching and performance
evaluation in academia: the salience of quality. Studies in Higher Education, 1–19.
19Carrell, S. E. and J. E. West (2010). Does professor quality matter? evidence from random
assignment of students to professors. Journal of Political Economy 118 (3), 409–432.
Centra, J. A. (1983). Research productivity and teaching eﬀectiveness. Research in Higher
education 18(4), 379–389.
Cherastidtham, I., J. Sonnemann, and A. Norton (2013). The teaching-research nexus in higher
education. (2013- 12).
Cretchley, P., S. Edwards, P. O’Shea, J. Sheard, J. Hurst, and W. Brookes (2014). Research
and/or learning and teaching: a study of australian professors’ priorities, beliefs and be-
haviours. Higher Education Research and Development 33 (4), 649–669.
de Goede, M. and L. Hessels (2014). Drijfveren van onderzoekers. Technical report, Rathenau
De Philippis, M. (2015). Multitask Agents and Incentives: The Case of Teaching and Research
for University Professors. CEP working paper 1386.
De Witte, K., N. Rogge, L. Cherchye, and T. Van Puyenbroeck (2013). Economies of scope in
research and teaching: A non-parametric investigation. Omega 41, 305–314.
Egghe, L. (2006). How to improve the h-index. The Scientist (3), 14.
Elken, M. and S. Wollscheid (2016). The relationship between research and education: typologies
and indicators. A literature review. , Nordic Institute for Studies in Innovation, Research
Emery, C. R., T. R. Kramer, and R. G. Tian (2003). Return to academic standards: A critique
of student evaluations of teaching eﬀectiveness. Quality Assurance in Education 11 (1), 37–46.
Euwals, R. and M. E. Ward (2005). What matters most: teaching or research? empirical
evidence on the remuneration of british academics. Applied Economics 37 (14), 1655–1672.
Feld, J. and U. Zulitz (2016). Understanding peer eﬀects: On the nature, estimation and
channels of peer eﬀects. Journal of Labor Economics Forthcoming.
Galbraith, C. S. and G. B. Merrill (2012). Faculty research productivity and standardized
student learning outcomes in a university teaching environment: A bayesian analysis of rela-
tionships. Studies in Higher Education 37 (4), 469–480.
Glanzel, W. (2006). On the opportunities and limitations of the h-index. Science focus.
Gottlieb, E. E. and B. Keith (1997). The academic research-teaching nexus in eight advanced-
industrialized countries. Higher Education 34 (3), 397–419.
Hattie, J. and H. W. Marsh (1996). The relationship between research and teaching: A meta-
analysis. Review of educational research 66 (4), 507–542.
Hattie, J. and H. W. Marsh (2004). One journey to unravel the relationship between research
and teaching. In Research and teaching: Closing the divide? An International Colloquium,