Oxford Workshop on the use of Metrics in Reseach Assessment

A workshop convened by Sir Gareth Roberts, President of Wolfson College, Oxford and sponsored by the Higher Education Funding Council for England (HEFCE) and the Office of Science and Technology (OST).

Session 1 Research Assessment in Europe

The workshop opened with a series of presentations examining the role of research metrics in developing comparative national rankings of higher education research facilities in Germany, the Netherlands and the UK. Throughout the workshop, presenters referred to data produced commercially by Thompson ISI (Institute for Scientific Information), which quantifies international publications and citations in a finite list of high profile journals in a range of fields. Recognised as the most comprehensive source of international bibliometrics available, ISI data are nevertheless limited by their bias towards US-based work in the physical sciences. Work currently in train by the publishing house Elsevier, to produce a new citations database with a slightly more diverse focus is likely to provide a welcome broadening of coverage.

Focussing on the advisory rankings produced by the Deutsche Forschungsgemeinschaft, or German Research Foundation (DFG), Juergen Guedler explored the use of these funding-based metrics as a proxy for research volume and quality. A comparison with available bibliometric data indicated broad correlation between funding and productivity at institutional level, although in Germany there is no formal link between DFG rankings and national research funding policy or allocations. Guedler demonstrated how DFG data could also be used to analyse the nature of national and international research networks and demonstrate the relative strengths of individual institutions. This work is to be taken forward in a newly created DFG institute, which Guedler will lead, potentially building into a European centre for excellence in research metrics analysis.

Germanys DFG data is complemented by rankings produced by the Centrum für Hochschulentwicklung, or Centre for Higher Education Development (CHE). The CHEs Stefan Hornbostel described how a survey of senior academic staff is used by the Centre to establish a ranking of institutions in order of their reputation within Germanys higher education research sector. These rankings are used to supplement additional proxy indicators including publication counts (per researcher), research income, international bibliometrics and patent applications. Postgraduate research student numbers are also taken into consideration on the basis that they provide a reasonably accurate proxy for research volume in a healthy research environment. Again, it was shown that while there is broad correlation between these indicators in some cases, their relevance is limited in fields such as engineering, law and economics, and even more so in the applied disciplines. Throughout the workshop this element of the German experience was repeatedly supported by speakers from all nations represented.

Speaking on behalf of the Association of Universities in the Netherlands (VSNU), Frans van Steijn emphasised the role of evaluation in the improvement of national research performance, the advancement of institutional research policy and the accountability of public resources. The Dutch rankings, based on a combination of proxy indicators and informed peer review, are not used as a national base for funding decisions or research policy. The process is operated by the academic community and incorporates both a three year self-assessment cycle and a rigorous six-yearly assessment by international peers. The exercise is informed by the European Quality Foundation Model (EQFM) and aims to produce not comparisons, but comparable outcomes. These comprise both verbal and numerical components, and despite advice to the contrary, it is recognised that the numerical component, or institutional scores, receive more attention than the qualitative or verbal evidence which accompanies it. Again, this experience ties in with that of other nations in which all kinds of numerical rankings receive very high levels of interest from stakeholders within higher education and beyond.

Evidence UKs Jonathan Adams explored how international bibliometrics may be used in some cases to confirm or challenge the findings of qualitative assessment processes. Again, it was found that while correlation of the two evidence sets was not consistent across all disciplines, comparison was beneficial in a number of fields including, most notably, the hard sciences. Picking up an earlier point raised by Stefan Hornbostel, Adams demonstrated how breaking down submissions into smaller components enabled Evidence to map departmental research structures (or RAE units of assessment) onto the international fields of research used by ISI. Interventions from German and Australian delegates attested to the significance of this form of mapping on the future development of proxy indicators for use at disaggregated levels of assessment. While data at the level of the individual researcher can be derived from the information available currently, the complex limitations of the existing system mean it can only be used with extreme caution on the part of the analyst. In response to the suggestion that RAE style outcomes could only be produced by an RAE style process it was agreed that the use of indicators alone could never reproduce the effect of expert review.

Session 2 The Assessment of Science

Taking the UK science base as a case study, Aldo Guenas presentation focused on the impact of input variables on research output or productivity, finding that statistical data both confirmed and elaborated on anecdotal evidence. Exploring trends in research productivity in four disciplines in the period 1984/85 2001/02, Geuna demonstrated a visible correlation between factors such as research funding, institutional category (e.g. age, reputation, location) and undergraduate teaching loads. Clear results emerged from the comparison of time series data from a variety of sources including HESA, ISI and HEFCE, showing a noticeable difference in time-lag for citations between disciplines such as the Natural Sciences, Medicine, Engineering and the Social Sciences. Again it was suggested that rather than for resource allocation, the data produced by these analyses were most useful as a tool to examine the effect of multiple factors on scientific productivity and to inform international benchmarking exercises.

The international element was taken up by Ulrich Schmoch in his consideration of technological competitiveness in Germany. Once again this presentation tended to highlight the limitations of available indicators and the varying usefulness of certain indicators between scientific disciplines. While a list of the top ten indicators suggested by researchers in each field tended to include the same sorts of metrics the relevance of each varied considerably and it was agreed that most indicators would incur statistical noise which would limit their application. Schmoch paid particular attention to the notion that scientific performance and technological progress should not be conflated, arguing that particular indicators (such as patent applications and bibliometrics) considered in isolation could appear contradictory as they did not capture all aspects of performance within and between disciplines.

All of the limitations touched upon by the days speakers were summed up by Anthony van Raan who emphasised the difficulty of applying quantitative measures to the essentially abstract concept of science. While productivity can be measured quantitatively through the analysis of publication and citation metrics, quality must be judged by others, he argued, through a process of peer review. A sophisticated analysis followed, demonstrating how citation life-cycles, lag-effects and research trends may be explored using word frequency counts, a process which also enables the visual representation of research landscapes incorporating a range of inter-related variables. Regardless of the agreed limitations of metrics in the assessment of research quality, van Raans emphasis on the mechanics of bibliometrics, or how to use them productively in spite of their limitations, echoed the feeling of the workshop as a whole, and the international communitys commitment to maximising the usefulness of research metrics in as many scientific fields and contexts as possible.

After-dinner address by Lord May

A light-hearted address by Lord May following the workshop dinner at Brasenose College focused in turn on the dangers of underestimating the impact of British research internationally and the role of quantitative indicators in highlighting this countrys relative performance on the world stage. Despite a national tendency to self deprecation, Lord May argued that metrics were able to prove irrefutably a level of competitiveness unsuspected by practicing British researchers. In conclusion, delegates were reminded that metrics alone could not always capture the essence of research and that they should never disregard the objectives of assessment and the ultimate impact of ranking performance.

Session 3 The Assessment of non-Science Disciplines

Challenging a number of assumptions raised in the first day of the workshop, including the reliability of peer review judgements, Australian bibliometrics specialist Linda Butler developed a case for the impartial nature of quantitative data. Tackling the limitations of ISI data head on, Butler explored the use of non-source data, in relation to Australian higher education research output, demonstrating that incorporating particular forms of invisible publications into bibliometric analyses could, in some cases, provide more credible rankings than those produced from ISI data alone. Significantly, the research was held to show that it is technically feasible to develop indicators for non-source publications, and to provide new metrics for quantifying productivity in subjects other than the hard sciences.

Geoff Crossick of the Arts and Humanities Research Board (AHRB) and Ian Diamond of the Economic and Social Research Council (ESRC) expressed a shared commitment to finding new ways of measuring the volume, impact and quality of research in the non-science disciplines. Factors such as access to infrastructure and indicators of esteem are particularly relevant in this context. While some research outcomes can be measured directly, Crossick argued, others can only be inferred by proxies and used to inform and underpin qualitative judgements. However, such analyses must take account of the specificities of a very wide and diverse subject community with a new culture of funded team projects but within which the individual researcher remains very important.

The role of bibliometrics in assessing research in the arts and humanities is complicated by the nature of research approaches based on critical discourse. Whereas citations in the physical and natural sciences are held to reflect the quality and impact of the source text, this correlation does not extend to the arts and humanities. Furthermore, as Diamond affirmed, regionally based research in fields such as education and social science is unlikely to be cited internationally, regardless of its application or impact locally. Citation life-cycles and time lag effects were again discussed, and trends were shown to vary considerably between the disciplines in question and the hard sciences. An analysis of RAE 2001 submissions revealed that whereas some ninety per cent of research outputs listed by British researchers in the fields of Physics and Chemistry were mapped by ISI data, in Law the figure was below ten percent. It was also noted that, world wide, some eighty per cent of publications receive as few as five citations in total, and a considerable proportion of publications are never cited at all. Despite these limitations and the acknowledged complexity of bibliometric analyses, both presenters agreed that there was scope to develop more relevant indicators within the disciplines in question.

In discussion, delegates grappled with the emerging conflict between the assessment of quality at institutional level and the related issue of international benchmarking. It was felt that the 2001 UK RAE had been partly limited by a lack of clarity and guidance in relation to its objectives, compounded by insufficient consistency between panels and limited transparency of process. Modifications to be introduced in the 2008 exercise are designed to correct these faults.. Peer review judgements and quantitative indicators were each held to be vital in exposing and explaining discrepancies produced by simplistic rankings. No single measure taken in isolation could possibly assess productivity, quality and impact together. In response to strong arguments relating to discipline specificity, it was suggested that fundamental questions of identity and progress were common to all fields of research. Nevertheless, delegates concluded, while the conception of what scientific knowledge is is central to research in the arts and humanities, it does not necessarily lead to the same sorts of investigations, or research questions, as it does elsewhere.

Session 4 Plenary Discussion

The final workshop session was devoted to a wide-ranging discussion of all aspects of the development and application of research metrics. The key issue to emerge at this stage was the relationship between metrics and expert review. Notwithstanding the undisputed usefulness of quantitative data, particularly in the context of international benchmarking, delegates agreed that metrics could not entirely displace peer review, nor should they be used in isolation to drive funding allocations within the UK. Bruno Zimmermann, as chairman of the session, commented firstly on the broad catalogue of uses identified for research metrics, ranging from quantifying productivity and establishing national competitiveness in the hard sciences, to mapping research landscapes and networks of collaboration across all fields of investigation. Gareth Roberts, whilst acknowledging the use of bibliometrics for these purposes and others such as in providing appropriate time activity profiles and in indicating breaking waves of new fields of research, commented on their weakness in measuring applied and practice-led research. In the UK consideration is being given to alternative metrics for evaluating these knowledge-transfer activities. Another note of caution was introduced with consideration of the unintentional consequences of formally recognising particular indicators. Evidence of radical changes in the publication behaviour of Australian researchers following the introduction of a metrics based funding system was cited, as was an increase in the production of journal articles, at the expense of conference proceedings, by British engineering researchers in the lead up to the 1996 RAE. Delegates agreed that the impact of change on publication behaviours suited to particular forms of research should be taken into account prior to the introduction of new forms of assessment.

Throughout the session, discussion repeatedly focused on the objectives of research assessment, asking what is actually being measured in each case, and why. While productivity in some fields is recognised as a proxy for research quality or excellence it was noted that excellence may itself be seen as a proxy for impact. Impact here should be seen as the contribution of research towards social, economic and scientific progress nationally and globally. Therefore, it was argued by one delegate that while excellence may be recognised as a justifiable goal in itself, it is not a justification for major public funding. It was then argued that the UK RAE, through its expert review approach, was able to assess the black box of research; those non-quantifiable activities occurring between research input (funding) and output (productivity and impact). That is to say, rather than considering a set of proxy indicators, the RAE considers the broad basket of activities which, together, comprise research itself.

In the course of the workshop, a clear consensus emerged in favour of expert review as the most comprehensive method of assessing research quality. Despite this, delegates agreed that the activity was not infallible; nor was it seen as a clearly defined or consistently conducted activity. Unlike, bibliometrics, it was recognised that expert review is not distance invariant. Outcomes may be affected by factors such as reviewers expertise and subjectivity, for example personal tastes and opinions, length of time in the job and pressures within the panel. At the same time, it was acknowledged that citation counts are themselves subject to fashion. Again, neither method is an exact science, but this does not mean that either should be disregarded. Despite an incontestable degree of inconsistency, the very worst assessment tools could be shown to produce very similar results, for example in identifying the top ten best performers in a particular field. Significantly however, the fine grained assessment of performance at the top level was agreed to be the ultimate goal of many assessment exercises and procedures. In this context, the limited usefulness of averages was discussed, with delegates agreeing that it is only the very best research whose impact requires detailed investigation and analysis.

The notion of informed peer review was evoked on several occasions. But this line of argument was ultimately rejected as too limited. Rather than informing peer review, delegates argued that metrics were most valuable in increasing the objectivity and transparency of an expert review process such as the RAE. While British scientists were considered unlikely ever to accept an entirely mechanistic approach to research funding it was agreed that bibliometrics did provide additional relevant information not otherwise available to peer reviewers. Furthermore, as one delegate observed, bibliometrics already contain an element of peer review, as it is central to the way research outputs are traditionally selected for publication. To date, therefore, any consideration of publication or citation figures is necessarily informed by both elements of the golden combination of qualitative and quantitative indicators.

Delegates went on to consider the impact of open access publishing on the metrics landscape. It was agreed that electronic publication would inevitably create opportunities for developing new indicators bringing with them a range of considerations such as accessibility, an acceleration of citation lifecycles, pre-print availability, and comparison with existing indexes such as ISI. It was observed that the immediacy of open access could lead to science becoming more hectic, and a number of projects were noted as beginning to monitor the impact of these developments.

Conclusions

As the Workshop drew to a close a number of broad conclusions and possible next steps were articulated, with a view to informing non-specialist decision makers and funders on future applications of research metrics. The real world of international research is a diverse and complex landscape of research groups, funding sources, activities and outputs. There was an acknowledgement that metrics could neither replace expert peer review, nor could they be ignored. As far as the UK RAE was concerned it was suggested that the notion of incorporating metrics should be crystallised into a framework that provided consistent, clear and transparent guidance to panels. More generally, participants signed up to the following summary statement drafted by Chris Henshall:

Publication and citation metrics play a key role in research assessment, in particular by supporting, challenging and supplementing more direct forms of peer review. The contribution they can make varies across disciplines; they have a role to play in all areas (including the social sciences, arts and humanities) but more direct forms of peer review are also needed in all areas to ensure valid assessments of quality. The possible effects on behaviour of using particular metrics as indicators of quality or productivity needs to be considered carefully in advance.

It was agreed that more broadly focussed activities to develop international benchmarking tools should be taken forward collaboratively, possibly in association with the new DFG Institute. Bruno Zimmermann on behalf of the German Research Foundation, invited the agencies present at the Oxford Workshop to meet again in Berlin in two years time to consider progress in the field. The suggestion was warmly welcomed by delegates and recognised as confirmation of the all round success of all sessions of this timely and informative exploration of research metrics.

Annex 1

List of Presenters at Oxford Workshop

NAMES AFFILIATION
Dr Jonathan Adams Evidence UK Ltd., Leeds
Prof Linda Butler Australian National University, Canberra
Prof Geoffrey Crossick UK Arts and Humanities Research Board, Bristol
Professor Ian Diamond UK Economic and Social Research Council, Swindon
Dr Aldo Geuna SPRU, University of Sussex
Prof Jurgen Guedler Deutsche Forschungsgemeinschaft (German Research Foundation) Bonn
Prof Stefan Hornbostel University of Dortmund
Prof Anthony van Raan University of Leiden
Prof Frans van Steijn Vereniging van Universiteiten in Nederland, Utrecht
Dr Ulrich Schmoch Fraunhofer Institute, Karlsruhe

List of Chairmen at Oxford Workshop

NAMES AFFILIATION
Dr Chris Henshall UK Office of Science and Technology, London
Prof Sir Gareth Roberts Wolfson College, Oxford
Dr Bruno Zimmermann German Research Foundation (DFG), Bonn
News
Events

The latest from Wolfson College

Ben Armitage photo
24 February 2020
Wolfson student takes research to Parliament

Ben Armitage is a final year PhD student in the Department of Materials.

Band Don't Laugh in Leonard Wolfson Auditorium
14 February 2020
Wolfsonians release song in solidarity with students in Wuhan

Wolfson is very proud to announce that Don't Laugh has released their song in solidarity with students in Wuhan.

All-Innovative banner
10 February 2020
All-Innovate: The University's idea competition

Entries for the 2020 All-Innovate Idea competition are open until 16 February 2020. 

Our upcoming events

Parties and Dinners
29 - 29
Feb Feb
Degree Day
Saturday 29 February - 11:00am to 3:30pm

Preceding the degree ceremony in the Sheldonian Theatre at 14:00, the College hosts the Signing of the Graces and a celebratory reception for our graduands and their family and friends. This event is by invitation only; please see the degree days page on our website for further information. 

Lectures and Seminars
02 - 02
Mar Mar
Liberal International Order in Trouble
Monday 2 March - 5:30pm to 7:00pm

Leading expert in International Relations & former British Academy President, Professor Sir Adam Roberts will deliver a Keynote Lecture on the contemporary decline of the liberal order, calling for a rethinking of liberal ideas and practices.

The term ‘liberal international order’ has become widely used—generally to refer to the international system that developed in the years after the end of the Cold War in 1989, or even to the whole period since the end of WWII.

Annual Lecture
02 - 02
Mar Mar
Annual London Lecture: Last Supper in Pompeii
Monday 2 March - 6:30pm to 9:00pm

Last Supper in Pompeii' is inspired by the recent blockbuster exhibition at the Ashmolean Museum, which Dr Roberts curated. Combining masterpieces of Roman art with cutting edge archaeological research, the talk looks at the Roman passion for food and wine.