Student Loan Borrowers 1997-2002
Technical Notes ...
Introduction
The statistics in this release have been produced using a database of student loan and tertiary education data held by Statistics New Zealand. One of the important components of the database is an integrated dataset on student loan borrowers. This dataset was created by linking administrative records from a number of government agencies:
- individual students’ tertiary enrolment data from the Ministry of Education (MoE)
- individual students’ borrowing data from the now-defunct Student Loan Account Manager (SLAM) provided by Inland Revenue (IRD) and MoE
- individual students’ borrowing data (from 2000 onwards) from StudyLink, a service of the Ministry of Social Development (MSD)
- individuals’ repayment and income data from IRD.
The integrated dataset contains data on the loans of students who borrowed under the Student Loan Scheme in any of the years 1997 to 2002. Each additional year's data will be added when it becomes available.
Background
The Student Loan Scheme began in 1992. In June 2000, the Auditor-General released a report entitled "Student Loan Scheme – Publicly Available Accountability Information".The report proposed that Statistics New Zealand integrate selected datasets relating to the Student Loan Scheme with a view to providing statistics for strategic policy, financial risk management, financial reporting and forecasting. As a result of work undertaken to follow up the Auditor-General's report, Statistics New Zealand was directed to lead an investigation into the privacy, logistical and data issues around data integration. Other agencies involved in the exercise were MoE, IRD, The Treasury, the then Department of Work and Income, and the then Ministry of Social Policy. The Privacy Commissioner was consulted over the privacy issues involved in integrating personal data from different sources.
That scoping exercise led to Statistics New Zealand being directed to undertake further investigations. In particular, work was required to address some issues raised by the Privacy Commissioner. A trial integration exercise was also undertaken to determine the types of matching required to maximise the representativeness of the integrated dataset (ie to minimise the differences between students in the integrated dataset and the overall group of students with loans). That work was successfully completed and reported on to Cabinet.
In April 2002, Cabinet funded Statistics New Zealand to construct an integrated dataset on student loan borrowers dating back to 1997. Annual updates were also funded, with statistics to be released by Statistics New Zealand each year. Cabinet's approval for the work was given in the knowledge that the Privacy Commissioner was comfortable with the revised methodology proposed by Statistics New Zealand. During 2002, Statistics New Zealand established the first integrated dataset on student loan borrowers, containing data on students who borrowed in any of the years 1997 to 2000. The first release of data from this new integrated dataset occurred on 10 December 2002.
Differences between data published in 2002 and 2004
As well as the integrated dataset now including data on students who borrowed in 2001 and 2002, some other substantial changes mean the statistics released in 2002 are not comparable with the statistics in this Hot Off The Press. As familiarisation with and understanding of student loan data increased, ways were identified to improve the integrated dataset and the quality of the statistics resulting from it. This led to changes in the way some variables are specified. For example, the date behind the structure of IRD data became the 'return period date' (ie the financial year to which a transaction applies), rather than the date on which a particular transaction had an effect on an account.
Additionally, to provide a more complete picture of student loans, $0 has been included as a category in tables. In Table 1, this reflects people who borrowed during a year but paid off the full amount before the loan was transferred to IRD, and in Table 2 ,people who had paid off their loan in full by March 2003. Compared with previously published data, the inclusion of these zero values will have reduced the mean and median dollar figures.
Statistics New Zealand also made changes to the integration process. The decision was taken to re-link the student/borrower records for the years 1997 to 2000 which made up the original integrated dataset. This is because:
- varying definitions of 'year' (calendar, study and tax year) were found to have introduced inconsistencies into the linked data
- the data model (used as the basis for storing data on the database) was revised and simplified, meaning the previous link file could not be used.
As part of the reintegration, borrowing data was matched in two separate tasks: one involving older Student Loan Account Manager (SLAM) data from 1997 to 2000, and the other involving MSD data from 2000 to 2002. Loans data was merged with IRD data through the use of tax file numbers, and this combined data was then probabilistically matched to MoE data (see below).
Matching methodology
The integrated dataset was created in two stages. The first stage merged the records of IRD and MSD (or SLAM) using tax file numbers. The second stage used a probabilistic matching methodology to match the MoE data to the already linked 'loans' dataset created in the first stage. Variables used in this second stage were student identification number; tertiary institution number; year of study/borrowing; sex; day, month and year of birth; surname; first initial of first name; and ethnicity. Probabilistic matching allows records to be linked when some of the matching variables are not unique, have incorrect values or are missing.
The overall link rate for the integrated dataset, after both exact and probabilistic matching was completed, was 92 percent. There was considerable variation in the link rate over the years, due in most part to the lack of education data for students at private training establishments (PTEs) prior to 2000. For 1997 to 1999, the average link rate was 87 percent, and for 2000 to 2002, the average link rate was 95 percent.
It is important to note that when Statistics New Zealand is satisfied that all checking has been completed, unique identifiers are deleted from the integrated dataset. Annual updating of the integrated dataset is achieved through the use of a non-reversible algorithm. Statistics New Zealand runs the algorithm against the integrated dataset to produce a new linking identifier for each unit record. It is this new variable that enables each year's data to be added to the integrated dataset.
Differences between all student loan borrowers and those in this release
Users should be aware that official statistics on student loans are also published in the "Student Loan Scheme Annual Report",jointly published each year by MoE, IRD and MSD. The official statistics on student loan borrowers included in this Hot Off The Press may differ from statistics published in the annual report. This is because the source administrative data was provided to Statistics New Zealand at a specified cut-off date which differs from the one used for the compilation of the annual report. In addition, the tables in this Hot Off The Press use different populations from those in the annual report.
For example, because all financial data in the tables comes from IRD, information is restricted to borrowers whose loans had been transferred from StudyLink to Inland Revenue by the date IRD extracted the data for Statistics New Zealand (October 2004). The figures show the situation as at the end of the tax year, on 31 March 2003. This means that in this Hot Off The Press,the population of students who borrowed in 2002 is defined as those who borrowed during the 2002 academic year, whose study end date occurred in the tax year ending 31 March 2003, and whose borrowing record for that year had been transferred from StudyLink to IRD by October 2004.
The use of probabilistic matching significantly increased the percentage of student loan borrowers whose information was included in the integrated dataset. However, it should be noted that some types of student loan borrowers were still under-represented, even after both stages of matching were completed. Due to the absence of data from MoE's dataset on students attending PTEs before 2000, the link rate of PTE students prior to 2000 was around 10 percent. From 2000 onwards, the link rate rose to around 90 percent.
Key variables in the Hot Off The Press tables
Demographic and socio-economic characteristics
- Age group: the person's age at 1 July in the reference year in the table.
- Ethnic group: sourced from MoE from 1997 to 1999, and primarily from MSD from 2000 onward. This means a mix of prioritised and unprioritised ethnic groups have been used.
- Level of study: the qualification(s) for which the student was enrolled (completed or not). A student can be enrolled in more than one level of study.
- Field of study: the international standard classification of the field(s) of study or subject of a programme of study in which the student was enrolled. A student can be enrolled in more than one field of study.
- Declared overseas: includes people who advised IRD they were living overseas or were departing to live overseas during the reference year. There will be additional holders of student loans who were overseas and had not advised IRD, but this figure cannot be accurately quantified.
Student loan variables
- Amount transferred: the amount transferred from StudyLink to Inland Revenue (ie drawdowns minus repayments that were made before the loan was transferred).
- Loan balance: the total loan balance owing, including the administration fee and interest accrued (net of any write-offs), less any repayments received.
Year
Note that enrolment and borrowing data generally relate to a calendar year. IRD data relate to the tax year (1 April to 31 March).
Student loan and tertiary education data held by Statistics New Zealand
Statistics New Zealand's student loan and tertiary education database has several components:
- The integrated dataset (ie matched data for students who had loans in any of the years 1997–2002).
- IRD information for 1997–2002 for student loan borrowers whose records were not matched.
- 1992–1996 information from IRD on student loan borrowing.
- Some income information from 1996 onward for any person taking out a student loan.
- MoE enrolment data for all formal students, regardless of whether they were student loan borrowers. A formal student is one who is enrolled in a formal programme of study at a tertiary education provider for more than one full-time week.
- SLAM information for students who borrowed in or before 1999. SLAM data was provided by both MoE and IRD.
Several of these components are updated each year. Data on other tertiary education issues may be added to the database in the future.
The statistical tables included with this release are just examples of data that can be produced from the database. Customised requests can be run by Statistics New Zealand and, on application, researchers may be able to access unidentified data in Statistics New Zealand's Data Laboratory.
Reliability of the data
Statistics New Zealand has made every attempt to minimise errors in the student loan and tertiary education database but two types of error will have occurred: errors in source data and errors due to record linking. Statistics New Zealand validated source data as it was received from each agency. If errors were detected, agencies were asked to rerun their data and provide it again. There will still, however, be some errors in the source data supplied. Omissions in the original collection of information and errors in data entry and processing will have occurred. These cannot be quantified. In terms of errors in record linking, Statistics New Zealand used sound probabilistic matching methodology, but errors will still be present. For example, over all years, an average of 8 percent of loan records were not linked to an MoE record, with considerable improvement from 2000 onwards. There will be some cases where an IRD record has been linked to the wrong MoE record, but this is estimated as affecting fewer than 1 percent of the links.
Copyright
Information obtained from Statistics New Zealand may be freely used, reproduced, or quoted unless otherwise specified. In all cases Statistics New Zealand must be acknowledged as the source.
Liability
While care has been used in processing, analysing and extracting information, Statistics New Zealand gives no warranty that the information supplied is free from error. Statistics New Zealand shall not be liable for any loss suffered through the use, directly or indirectly, of any information, product or service.
Timing
Timed statistical releases are delivered using postal and electronic services provided by third parties. Delivery of these releases may be delayed by circumstances outside the control of Statistics New Zealand. Statistics New Zealand accepts no responsibility for any such delays.
Student Loan Borrowers: 1997–2003will be released in 2005.
For information on the changing face of older New Zealanders, visit www.stats.govt.nz/older-people |
