Osteoarthritis hip and knee (algorithm)

Osteoarthritis hip and knee (algorithm)

LASA filenames:
lasazoa1.sav, lasazoa2.sav, lasazoa3.sav

Contact: Marjolein Visser


Osteoarthritis (OA) is a highly prevalent chronic rheumatic disorder in western populations and is a leading cause of pain and disability among elderly [1]. It is expected that the prevalence of OA will increase with 38% between 2000 and 2020 in the Netherlands [2]. Although much research is conducted on OA, there is no consensus on how to assess the presence of OA. Therefore, an algorithm to assess osteoarthritis of the hip and knee was developed in LASA.
Note: in a later side study in which LASA participated, i.e. The European Project on OsteoArthritis (EPOSA), clinical osteoarthritis was defined according to the criteria of the American College of Rheumatology [3].

OA algorithm

With literature and the available LASA data, an algorithm for the presence of OA was made in 2007. The LASA study contains self-reported data, general practitioner (GP) data and x-ray data of the spinal cord (T4-L5) and the hip (at C and D cycles) on OA. As the correlation between radiographic evidence and symptomatic OA is rather weak [4], symptomatic OA was defined by means of self-reported data and GP data only. The focus was on knee and hip OA, as they are the most prevalent locations of OA: according to the Dutch GP registration 2000 approximately 77% of all OA locations are of the knee and hip [5]. Medication use was not taken into account as there is no arthritis specific medication. Pain was debated, but after considerations from Prof. J. Dekker (Professor of Allied Health Care at the Department of Psychiatry and the Department of Rehabilitation Medicine, VUmc), it was not included in the algorithm. Many OA patients have pain, however, there are no valid criteria for epidemiological use that tell us which people with joint pain have osteoarthritis [6]. Furthermore, joint pain is a follow-up question in LASA, so only respondents with self-reported OA provide an answer.

Previously in LASA, osteoarthritis and rheumatoid arthritis were categorised as one variable [7]. In this decision tree we separated OA and RA, as it is presumed respondents can differentiate between the two diseases as they are explained by the interviewer. However, it remains possible some misclassification is present.

The algorithm is based on data from the first LASA cohort to assess prevalent and incident OA (cycles B– F).  Currently, syntaxes are available for the waves B-I for the first, second and third LASA cohort.

Data sources

Data files and variables used in the OA algorithm:

self-report of OA

The variable xrheum01 assesses the presence of OA. Only when a respondent answered xrheum01 as ‘yes’, follow-up questions were asked. The presence of knee or hip OA is asked in the questions xrheum8g and xrheum8h respectively.

Cross-sectional decisions

Question xrheum01 stated ‘Do you have osteoarthritis in your knees, hips or hands?’ At B the response categories were; Missing, No and Yes. At the C, D, E and F cycles the response categories yes and no were extended to: ‘No, never’; ‘No, (previous) rheum01 Yes’; ‘Yes, (previous) rheum01 No’; ‘Yes, (previous) rheum01 Yes’. For each cycle, the two response categories starting with ‘No,..’ were categorized as No, and the two categories starting with ‘Yes,..’ were categorized as Yes.

The final cross-sectional OA categories created are Missing, No, Possible and Yes, as is illustrated by the decision tree. The decision tree is explained as:

  • The left branch: subjects who self-report OA, and the GP confirms this, are defined as ‘yes’. Subjects who self-report OA but the GP’s answer is no or missing are categorized as ‘possible’.
  • The middle branch: respondents who do not report OA, and the GP also reports no or the answer is missing, are recoded as ‘no’. When the GP does report OA, subjects are coded as ‘possible’ (not ‘yes’ because the GP questionnaire does not report the exact location of OA). As only respondents with hip or knee OA were of interest, subjects who self-reported OA, but of another location, were categorized as ‘no’, independent of the GP answer.
  • The right branch: subjects with a missing answer on the OA question (xrheum01) had no follow-up OA data. Also, the GP report does not contain the location of OA. It was therefore decided to keep those subjects as missing.

In comparing the presence of OA in the main interview with GP reported OA, some considerations had to be made. First, in this algorithm the opinion of the respondent weighs heavier than that of the GP as we were interested in self-reported symptomatic OA and because many cases of osteoarthritis are not known to the GP [7]. Second, as the GP data collections (5 year interval, approximately) do not take place in the exact period as the interviews with the respondents (3 year interval, approximately), subjects could be misclassified in these cross-sectional algorithms. For example, the 2000/01 GP data (CG01) is used for the C and D cross-sectional decisions. A GP reported diagnosis of OA at the 2000/01 GP data collection thus results in a positive GP diagnosis on both the C and D cycle. If a respondent has OA since 1997, the self-report (SR) on C would be ‘no’ and SR on D would be ‘yes’. Combining the SR data with the GP data (according to this algorithm) results in ‘possible OA’ on C and ‘OA’ on D. Although the year of the GP diagnosis is available, this variable is not used in the cross-sectional syntax. Finally, additional data from telephone interviews were not used because only the presence of the main chronic diseases are asked. Without additional branching questions there is no information on the location of OA so hip or knee OA cannot be selected.

Variable information

LASAzoa2: LASA-2B, LASA-F, LASA-G, LASA-H, LASA-I, LASA-J / LASA-K(second cohort)
LASAzoa3: LASA-3B, LASA-I, LASA-J / LASA-K (third cohort)

: J, K not available yet

Availability of information per wave


n=total number of
respondents per wave




n=total number of
respondents per wave





1 More information about the LASA data collection waves is available here.
2 These numbers are cross-sectional, and thus not longitudinally cleaned.
*  2B=baseline second cohort;
3B=baseline third cohort;
J, K=under construction

Use of algorithm

The syntax of the OA algorithm can be found here. This algorithm was created for defining knee and hip OA for a project on physical activity and the onset of OA. Some suggestions for longitudinal cleaning and reducing the number of missings are added to the syntax.

Previous use in LASA

  • Verweij, L.M., Van Schoor , N.M., Deeg, D.J.H., Dekker, J., Visser, M. (2009). Physical activity and incident clinical knee osteoarthritis in older adults. Arthritis & Rheumatism (Arthritis Care & Research), 61, 2, 152-157.


  1. Bone and Joint Decade. European Action Towards Better Musculoskeletal Health. 2-6-2005.
  2. Schouten JSAG, Poos MJJC, Gijsen R. Neemt het aantal mensen met artrose toe of af? In: Volksgezondheid Toekomst Verkenning (RIVM) 15-11-2002.
  3. Altman RD. Classification of disease; osteoarthritis. Semin Arthritis Rheum. 1991; 20 (suppl. 2):40-47.
  4. Dieppe PA, Lohmander LS. Pathogenesis and management of pain in osteoarthritis. Lancet. 2005; 365 (9463): 965-73.
  5. Gijsen R (RIVM), Poos MJJC (RIVM). Achtergronden en details bij cijfers uit huisartsenregistraties. In: Volksgezondheid Toekomst Verkenning, Nationaal Kompas Volksgezondheid. Bilthoven: RIVM, 16 mei 2003.
  6. Felson DT. Osteoarthritis of the knee. New England Journal of Medicine. 2006; 354: 841-8.
  7. Kriegsman DMW, Penninx BWJH, Eijk JThM van, Boeke AJP, Deeg DJH (1996). Self reports and general practitioner information on the presence of chronic diseases in community dwelling elderly. A study on the accuracy of patients’ self-reports and on determinants of inaccuracy. J Clin Epidemiol, 49:1407-17.

Date of last update: April 20, 2020 (LvZ)