Abstract
Describing the distribution of disease between different populations and over time has been a highly successful way of devising hypotheses about causation and for quantifying the potential for preventive activities.1 Statistical data are also essential components of disease surveillance programs. These play a critical role in the development and implementation of health policy, through identification of health problems, decisions on priorities for preventive and curative programs and evaluation of outcomes of programs of prevention, early detection/screening and treatment in relation to resource inputs. Over the last 12 years, a series of estimates of the global burden of cancer have been published in the International Journal of Cancer.2-6 The methods have evolved and been refined, but basically they rely upon the best available data on cancer incidence and/or mortality at country level to build up the global picture. The results are more or less accurate for different countries, depending on the extent and accuracy of locally available data. This "data-based" approach is rather different from the modeling method used in other estimates.7-10 Essentially, these use sets of regression models, which predict cause-specific mortality rates of different populations from the corresponding all-cause mortality.11 The constants of the regression equations derive from datasets with different overall mortality rates (often including historic data from western countries). Cancer deaths are then subdivided into the different cancer types, according to the best available information on relative frequencies. GLOBOCAN 2000 updates the previously published data-based global estimates of incidence, mortality and prevalence to the year 2000.12 The data sources that have been used to build up the global estimates are as follows. Incidence, the number of new cases occurring, can be expressed as the annual number of cases (the volume of new patients presenting for treatment) or as a rate per 100,000 persons per year. Incidence data are produced by population-based cancer registries.13 Registries may cover national populations or, more often, certain regions. In developing countries in particular, coverage is often confined to the capital city and its environs. It was estimated that, in 1990, about 18% of the world population were covered by registries, 64% of developed countries and 5% of developing countries, although the situation is improving each year. The most recent volume of "Cancer Incidence in Five Continents" (CI5) contains comparable incidence information from 150 registries in 50 countries, primarily over the period 1988–1992.14 Survival statistics are also produced by cancer registries by the follow-up of registered cancer cases. Population-based figures are published by registries in many developed countries, for example, the SEER program covering 10% of the U.S. population15 and the EUROCARE II project, including 17 countries of Europe.16 Survival data from populations of China, the Philippines, Thailand, India and Cuba have been published by Sankaranarayanan et al.17 Mortality is the number of deaths occurring and the mortality rate the number of deaths per 100,000 persons per year. It is the product of incidence and fatality (the inverse of survival) of a given cancer. Mortality rates measure the average risk to the population of dying from a specific cancer, while fatality (1-survival) represents the probability that an individual with cancer will die from it. Mortality data are derived from vital registration systems, where the fact and "underlying" cause of death are certified, usually by a medical practitioner. Their great advantage is comprehensive coverage and availability. By 1990, about 42% of the world population was covered by vital registration systems producing mortality statistics on cancer. Not all are, however, of the same quality in all countries. National-level statistics are collated and made available by the World Health Organiztion (http://www-dep.iarc.fr/dataava/globocan/who.htm), although for some countries coverage of the population is manifestly incomplete (so that the so-called mortality rates produced are implausibly low) and in others, quality of cause of death information is poor. Frequency data, e.g., case series from hospitals and pathology laboratories, provide an indication of the relative importance of different cancers in a country or region in the absence of a population-based registry and mortality statistics. There are problems in extrapolating the results to the general population, since such series are subject to various forms of selection bias. Such data are generally published locally or in journal articles, although a few compendia are available.18, 19 Prevalence is the proportion of a population that has the disease at a given point in time.20 For many diseases (e.g., hypertension, diabetes), prevalence usefully describes the number of individuals requiring care. For cancer, however, many persons diagnosed in the past have been "cured"—they no longer have an excess risk of death (although some residual disability may be present, for example, following a resective operation). A straightforward comparison of need for cancer services can be made using partial prevalence, cases diagnosed within 1, 3 and 5 years, to indicate the numbers of persons undergoing initial treatment (cases within 1 year of diagnosis), clinical follow-up (within 3 years) or not considered "cured" (before 5 years). Patients alive 5 years after diagnosis are usually considered cured since, for most cancers, the death rates of such patients are similar to those in the general population. The methods used to produce the estimates are summarised in several recent articles.5, 6, 21, 22 The "Help" option of GLOBOCAN 2000 lists the sources of data and methods used for each country. National incidence data from good-quality cancer registries. National mortality data, with estimation of incidence using sets of regression models specific for site, sex and age, derived from local cancer registry data (incidence plus mortality). Local (regional) incidence data from 1 or more regional cancer registries within a country. When there are several cancer registries in the country, their incidence rates must be combined into a common set of values by some weighted average. Local mortality data from some sort of sample survey of deaths, converted to incidence using specific models. Frequency data. For several developing countries, only data on the relative frequency of different cancers (by age and sex) are available. These are applied to an estimated "all sites" incidence rate, derived from existing cancer registry results, in 7 world regions (Eastern Africa, Middle Africa, Northern Africa, Southern Africa, Western Africa, Middle East and Other Oceania). No data. The country-specific rates are those of the corresponding world area (calculated from the other countries for which estimates could be made). There are few large countries that fall into this category. Those with a population greater than 10 million were Morocco, Afghanistan, Nepal, Sri Lanka, Mozambique, Madagascar and Yemen. National mortality rates, with for some countries a correction factor applied to account for known and quantified underreporting of deaths. Rates for missing sites were computed using proportions from mortality files provided by cancer registries. When no national mortality data are available, local (regional) mortality rates derived from the data of 1 or more cancer registries covering a part of a country (state, province, etc.) were used. When mortality data were unavailable or known to be of poor quality, mortality was estimated from incidence, using country/region-specific survival (see prevalence data). In the absence of any data, country-specific rates are calculated from the average of those of neighbouring countries in the same regions. Estimates of partial prevalence in each country were derived by combining the annual number of new cases and the corresponding probability of survival by time. For example, 1-year prevalence at a fixed point in mid-2000 was estimated from the number of new cases in 2000 multiplied by the probability of surviving at least 6 months, and 3-year prevalence sums the numbers alive at 0.5, 1.5 and 2.5 years. Relative survival data were obtained from the sources cited above and converted to observed survival using "normal" mortality probability (derived from the corresponding life tables). The shape of the survival curve from 0 to 5 years postdiagnosis was assumed to follow a Weibull distribution.22 GLOBOCAN 2000 presents incidence, mortality and prevalence data for 5 broad age groups (0–14, 15–44, 45–54, 55–64 and 65 and over) and sex for all countries of the world for 24 different types of cancer. Since cancer data are collected and compiled sometime after the events to which they relate, the most recent statistics available are from periods from 3–10 years earlier. The actual number of cancer cases, deaths and prevalent cases are calculated by applying these rates to the estimated world population for 2000, obtained from the most recent projections prepared by the United Nations Population Division.23 On the CD-ROM are computer programs to analyse and present the cancer database. The database itself may be downloaded from the Internet (http://www-dep.iarc.fr/globocan/globocan.htm). This site contains the most recently available estimates of the incidence and mortality rates in different countries worldwide. GLOBOCAN 2000 can present the statistics described at any level of geographical aggregation and in tabular or graphical format (maps, bar charts, age-specific curves and pie charts). Some examples of these graphical presentations are shown on the cover of this issue. Tabulations of numbers and rates may also be displayed and printed. Incorporation of population projections for 5-year intervals, from 2