Skip to Main Content
Tilasto-oppaat
Tilastokeskuksen etusivulle

Remote Access to Research Data

5 Good remote access practices

5.1 System resources

This section explains how to use the remote access service. It also covers some basic principles of efficient programming, project management in the remote access service and smart use of system resources.

Efficient programming starts with a plan

Good planning is often the foundation of efficient programming. The aim is to write cohesive and well-sectioned code that allows the results to be reproduced quickly. The program should include easy-to-follow comments and maximum automation. You can use macros and various loops to adjust the code if you need to change timespans and the content of variables, for example (see “program” in Stata). Avoid copying code – it increases the risk of propagating errors.

A good principle is to remove any variables and data from the program immediately when they are unnecessary for your analysis. We advise against saving overlapping data sets and intermediate files. Stata allows data to be limited when a file is opened by selecting the necessary variables only (“use var1 var2 var3 using datax”).

To help you plan your programs, we recommend preparing a small set of training data that runs fast and uses few resources, so you have something to test while the final program takes shape. For demonstration data, you can extract a small sample from the full data set (see Example 1). We recommend excluding unneeded variables, years, industries, areas and groups of people at the start.

Example 1. Random sample

You can use the code below to extract a random sample in Stata for training purposes, for example:

  • 5% random sample: sample 5
  • 5% sample of units whose year value is “2000”, and full populations for units whose year is not “2000”: sample 5 if year==2000
  • 10,000-person sample: sample 10000, count

By first using set seed <value>, e.g. “set seed 10000”, you will always have the same random sample size.

You should consider the review procedure from the start, so also pay attention to the format and intelligibility of the results. Instead of long log files, you should aim to produce separate result tables or diagrams with distinct names. The data content of variables should also be easy to understand. Each row in a table must indicate the number of observations. We recommend including a rule in your program to allow only three observations or more (see Example 2). As a rule, minimums and maximums will not pass a data protection review, so they should not be included in the first place.  

Finally: remember to save often! Making daily backups ensures your files stay safe, even if the server malfunctions, for example.

Example 2. File protection

You can use the following code to protect all Stata DTA files in the O:\output_date folder with names ending in “_date” and with calculated observation quantities, n_obs, for each row. Note: the program will overwrite the old file with a new file using the same name, removing all cells with fewer than three observations.

global output_date "O:\output_date\"
cd "${output_date}"
local listaa: dir . files "*_date.dta"
foreach x of local listaa {
 use "`x'", clear
 cap drop if n_obs <= 3
 cap saveold "`x'", replace
 if _rc!=0 {
 save "`x'", replace

Using different folders for project management

Working in the remote access environment is like working on a regular Windows workstation. Project names use consecutive numbering, starting with the example project a01.

The following drives are available for each project: 

W:\a01 Work
work folder
O:\a01 Output
output transfer folder
D:\a01 Data
The Research Services’ ready-made and tailored project data, metadata and SISU model
E:\a01 Backup
small project-specific drive for backup storage
N:\
CRAN repository for R packages

Work folders, W drive

Each research project has a dedicated work folder on the W drive where you can keep the project’s program files, working files, etc.

Output transfer folder, O drive

Output files intended for publication are moved to the O drive for data protection review. You can create your own subfolders in the folders on W and O, but make sure you frequently remove all unnecessary folders, files and outputs that you have already received.

Research Services data sets and descriptions, D drive

You will receive read access to the ready-made data folders on the D drive as specified in your project or SISU model user licence. Data tailored for projects, including their descriptions, are also stored on D. The descriptions for ready-made data are located in the metadata folder. The same folder has the Research Services rules and current notices about things like data updates.

The smaller E drive is for long-term research project backup storage. The CRAN repository is used to load R software environment packages for your research project.

For SISU model users 

You can copy the SISU model from the SISU microsimulation folder on D to your own work folder on W. New model versions are announced separately to users.

Shared folder, F drive

Microsimulation model users can use the shared Forum folder (F drive) in the remote access environment to share files related to the model with other users.

Personal email folder, “Mail”

Each SISU model user has a personal email folder, “Mail”, in the remote access environment, which you can use to transfer files to your local workstation from the remote access environment. For every file copied to the Mail folder, you will receive a separate email with the copied file attached. The transferred files are also sent to the microsimulation team’s email inbox for review by Statistics Finland.

The data protection requirements, output protection methods, and review procedure are explained further in section 3 Statistical data protection of research data.

You must follow the Research Services’ instructions and rules for the remote access environment and its data transfers. These are explained in the Research Services rules.

How to save hardware and software resources

The remote access system provides dedicated computation and software resources for each project. We therefore recommend you avoid heavy simultaneous runs and unnecessary memory consumption. Heavier runs should be scheduled for nights and weekends. You can close the remote access window (disconnect), but must stay logged in. You can monitor your resource use in task manager by pressing Ctrl + Shift + Esc, for example. You can monitor your storage space consumption from the properties of each folder. Should you need to, you can always upgrade your storage or hardware package.

We recommend that you log out fully (sign out) on a regular basis to allow the installation of service packs from maintenance breaks. When you log out and close your remote connection, you free up resources for the other users in your project, allowing them to carry out heavier tasks.

You should keep a good balance between processing power and storage space as you write your code. It is sometimes better to store extensive intermediate results from different stages to avoid having to run repeatedly slow programs (long estimations, simulations, etc.). It is fine to run fast code such as descriptive analyses and results tables repeatedly if changes are needed.

How to send installation requests

Add-ons may be installed to the software in the remote access system (e.g. Stata), and even entirely new software may be added. Each installation must be evaluated for compatibility, potential issues and data protection, for example, so installation requests should be well founded and thought out. 

Include at least the following information when you submit an installation request to the Research Services:

  • target software name and version 
  • installer package name and version (e.g. “tables”)
  • instructions for acquiring the software/package
  • your description of what the software/package does and why you need it.

If your installation request is approved, we will typically install the software in the remote access system within a few working days. More complicated installations will be scheduled for the next remote access system maintenance break.

R packages can be downloaded from the CRAN library and Python packages from the Anaconda Pro repository. The instructions “CRAN_library_RStudio” and “Fiona_fix_python_reposities” on how to use the applications in the remote access system are available on the remote access system desktop.

5.2 Errors and what to do

The remote access system is a relatively complex computer system. For example, an error in one part may sometimes cause a malfunction in the system, which may be manifested as trouble logging in despite repeated attempts.

The remote access system’s maintenance breaks and issues are announced on the Research Services’ webpage for the FIONA system.

If you encounter an error, please contact FIONA’s maintenance (CSC). Instructions for contacting the CSC and instructions concerning the most common problems: FIONA Remote access support Portal - FIONA Technical support - Eduuni-Wiki.

Report errors in detail

When writing an error report, you should describe as accurately as possible the actions you took that led to the situation. The error will be easier to locate and fix if your report:

  • indicates the stage the error occurred in
  • explains what you did before the error occurred
  • describes what happened when the error occurred
  • lists what software you have been using
  • includes your project code.

If possible, take a screenshot of the error situation on your screen and write down what the error message says and attach these to your email. For data protection reasons, screenshots may not be taken of the remote access desktop contents. The troubleshooting will be escalated to an IT specialist if necessary.

You should report the error’s progress and development to the Research Services, even if the instructions you received, or the repairs that were attempted, have corrected the issue. This allows us to verify that the problem has been solved. Otherwise, we will continue to work on the problem.

Not every error needs to be reported

In some cases, you or someone you work with may be familiar with the error. We recommend asking your colleagues about errors – they may have already been instructed on how to solve them.

A typical problem that requires no contact is your user account locking. For example, if you mistype your password during login, this will temporarily suspend your account. The remote access system will restore the account automatically. The account will be restored after 30 minutes. If you still cannot log in after 30 minutes, contact the Research Services. Please ensure that you are using the identification method that you set up when you created the user account.

5.3 Wrapping up a project

Your project will end when the research project’s user licence runs out. If your project has been completed before this date, you can send a clearly worded written notice about the fact, stating that access to data is no longer required.

Request your results and code in good time

After your project ends, you cannot log in to the remote access system. Therefore, you need to submit a request to the Research Services well before your user licence expires to send all results to be published and any necessary code to your email address. We ask that you delete all unnecessary files and code you have created in the project folders.

Project folders are stored in the remote access service for a minimum of three months after a project ends. You can extend the storage period for files and code in the project folder by negotiating this with Statistics Finland before or at your project’s end.

Need more time or a new licence?

You can apply for an extension to your user licence if your research remains unfinished. You can also apply to have your user licence expanded to add new users or research data to your project.

If you need to use data saved in project folders in a new research project, you must apply for a new user licence for this purpose. For potential referee changes, ensure that you have write access to your code and research data for the time necessary after your research is completed.

Check that all user licences included in your research project remain valid, not just your Statistics Finland user licence. Any data from other data controllers linked to Statistics Finland data in the remote access system will be destroyed immediately after their data user licence expires.

Program code used to produce tailored research data from Statistics Finland base data will be stored by Statistics Finland until the relevant research project ends. For panel studies, you should arrange a “pseudo-key” to allow new data to be updated for the people included in your sample.

Individuals must not be identifiable from your data

As you prepare to publish your research results, check one last time that they cannot be used to expose the identity of the people and enterprises included in your data, directly or indirectly. Check that Statistics Finland is named as the source of your research data. Also remember to send a copy or a link to your published research reports to tutkijapalvelut@stat.fi as required in our user licence terms.

Give feedback

Good luck with your future projects! You are welcome to leave feedback for the Research Services (tutkijapalvelut@stat.fi) about anything you would like regarding our data, remote access system and other services.

5.4 Checklist for researchers

The checklist for remote access users of research data covers key principles regarding data use and user applications. You should remember them when you prepare your own study. When you have absorbed the instructions in this guide, you will have a good head start in your research.

  • You can only use Statistics Finland microdata with a user licence. We can issue user licences for scientific research and statistical surveys.
  • The SISU microsimulation model describes Finland’s personal tax and social security system. The model is used to calculate the effects of potential legal reforms of personal taxation and social security for the population and public finances. 
  • Protect your data! The disclosure of information regarding individuals, households, enterprises and other statistical units to third parties must be prevented.
  • If you need to use our microdata, submit your user licence application with appendices in Statistics Finland 's licensing service. If your data are transferred to the remote access system, you will make an agreement with Statistics Finland for remote access to our research service.
  • Researchers can use the FIONA remote access system for secure access to Statistics Finland data. You are only allowed to transfer research results and other materials out of the system after review.

Feedback about the guide

Did you find everything? Would you have wanted something more? Leave us your feedback about the guide so we can improve it! Send your comments by email to koulutuspalvelut@stat.fi.

Puun lehtiä syksyllä. Valokuvaan lehtien päälle on piirretty hymiöitä.