Remote Access to Research Data
This section explains how to use the remote access service. It also covers some basic principles of efficient programming, project management in the remote access service and smart use of system resources.
Good planning is often the foundation of efficient programming. The aim is to write cohesive and well-sectioned code that allows the results to be reproduced quickly. The program should include easy-to-follow comments and maximum automation. You can use macros and various loops to adjust the code if you need to change timespans and the content of variables, for example (see “program” in Stata). Avoid copying code – it increases the risk of propagating errors.
A good principle is to remove any variables and data from the program immediately when they are unnecessary for your analysis. We advise against saving overlapping data sets and intermediate files. Stata allows data to be limited when a file is opened by selecting the necessary variables only (“use var1 var2 var3 using datax”).
To help you plan your programs, we recommend preparing a small set of training data that runs fast and uses few resources, so you have something to test while the final program takes shape. For demonstration data, you can extract a small sample from the full data set (see Example 1). We recommend excluding unneeded variables, years, industries, areas and groups of people at the start.
You should consider the review procedure from the start, so also pay attention to the format and intelligibility of the results. Instead of long log files, you should aim to produce separate result tables or diagrams with distinct names. The data content of variables should also be easy to understand. Each row in a table must indicate the number of observations. We recommend including a rule in your program to allow only three observations or more (see Example 2). As a rule, minimums and maximums will not pass a data protection review, so they should not be included in the first place.
Finally: remember to save often! Making daily backups ensures your files stay safe, even if the server malfunctions, for example.
Working in the remote access environment is like working on a regular Windows workstation. Project names use consecutive numbering, starting with the example project a01.
The following drives are available for each project:
Each research project has a dedicated work folder on the W drive where you can keep the project’s program files, working files, etc.
Output files intended for publication are moved to the O drive for data protection review. You can create your own subfolders in the folders on W and O, but make sure you frequently remove all unnecessary folders, files and outputs that you have already received.
You will receive read access to the ready-made data folders on the D drive as specified in your project or SISU model user licence. Data tailored for projects, including their descriptions, are also stored on D. The descriptions for ready-made data are located in the metadata folder. The same folder has the Research Services rules and current notices about things like data updates.
The smaller E drive is for long-term research project backup storage. The CRAN repository is used to load R software environment packages for your research project.
You can copy the SISU model from the SISU microsimulation folder on D to your own work folder on W. New model versions are announced separately to users.
Microsimulation model users can use the shared Forum folder (F drive) in the remote access environment to share files related to the model with other users.
Each SISU model user has a personal email folder, “Mail”, in the remote access environment, which you can use to transfer files to your local workstation from the remote access environment. For every file copied to the Mail folder, you will receive a separate email with the copied file attached. The transferred files are also sent to the microsimulation team’s email inbox for review by Statistics Finland.
The data protection requirements, output protection methods, and review procedure are explained further in section 3 Statistical data protection of research data.
You must follow the Research Services’ instructions and rules for the remote access environment and its data transfers. These are explained in the Research Services rules.
The remote access system provides dedicated computation and software resources for each project. We therefore recommend you avoid heavy simultaneous runs and unnecessary memory consumption. Heavier runs should be scheduled for nights and weekends. You can close the remote access window (disconnect), but must stay logged in. You can monitor your resource use in task manager by pressing Ctrl + Shift + Esc, for example. You can monitor your storage space consumption from the properties of each folder. Should you need to, you can always upgrade your storage or hardware package.
We recommend that you log out fully (sign out) on a regular basis to allow the installation of service packs from maintenance breaks. When you log out and close your remote connection, you free up resources for the other users in your project, allowing them to carry out heavier tasks.
You should keep a good balance between processing power and storage space as you write your code. It is sometimes better to store extensive intermediate results from different stages to avoid having to run repeatedly slow programs (long estimations, simulations, etc.). It is fine to run fast code such as descriptive analyses and results tables repeatedly if changes are needed.
Add-ons may be installed to the software in the remote access system (e.g. Stata), and even entirely new software may be added. Each installation must be evaluated for compatibility, potential issues and data protection, for example, so installation requests should be well founded and thought out.
Include at least the following information when you submit an installation request to the Research Services:
If your installation request is approved, we will typically install the software in the remote access system within a few working days. More complicated installations will be scheduled for the next remote access system maintenance break.
R packages can be loaded from the CRAN library. Your remote access system desktop includes instructions, “CRAN_library_RStudio”, for using the CRAN library in the remote access system.
The remote access system is a relatively complex computer system. For example, an error in one part may sometimes cause a malfunction in the system, which may be manifested as trouble logging in despite repeated attempts.
The remote access system’s maintenance breaks and issues are announced on the Research Services’ webpage for the FIONA system.
If you encounter an error, send an email to the Research Services at firstname.lastname@example.org or the microsimulation team at email@example.com.
When writing an error report, you should describe as accurately as possible the actions you took that led to the situation. The error will be easier to locate and fix if your report:
If possible, take a screenshot of the error situation on your screen and write down what the error message says and attach these to your email. For data protection reasons, screenshots may not be taken of the remote access desktop contents. The troubleshooting will be escalated to an IT specialist if necessary.
You should report the error’s progress and development to the Research Services, even if the instructions you received, or the repairs that were attempted, have corrected the issue. This allows us to verify that the problem has been solved. Otherwise, we will continue to work on the problem.
In some cases, you or someone you work with may be familiar with the error. We recommend asking your colleagues about errors – they may have already been instructed on how to solve them.
A typical problem that requires no contact is your user account locking. For example, if you mistype your password during login, this will temporarily suspend your account. The remote access system will restore the account automatically. The account will be restored after 30 minutes. If you still cannot log in after 30 minutes, contact the Research Services. Please ensure that you are using the identification method that you set up when you created the user account.
Your project will end when the research project’s user licence runs out. If your project has been completed before this date, you can send a clearly worded written notice about the fact, stating that access to data is no longer required.
After your project ends, you cannot log in to the remote access system. Therefore, you need to submit a request to the Research Services well before your user licence expires to send all results to be published and any necessary code to your email address. We ask that you delete all unnecessary files and code you have created in the project folders.
Project folders are stored in the remote access service for a minimum of three months after a project ends. You can extend the storage period for files and code in the project folder by negotiating this with Statistics Finland before or at your project’s end.
You can apply for an extension to your user licence if your research remains unfinished. You can also apply to have your user licence expanded to add new users or research data to your project.
If you need to use data saved in project folders in a new research project, you must apply for a new user licence for this purpose. For potential referee changes, ensure that you have write access to your code and research data for the time necessary after your research is completed.
Check that all user licences included in your research project remain valid, not just your Statistics Finland user licence. Any data from other data controllers linked to Statistics Finland data in the remote access system will be destroyed immediately after their data user licence expires.
Program code used to produce tailored research data from Statistics Finland base data will be stored by Statistics Finland until the relevant research project ends. For panel studies, you should arrange a “pseudo-key” to allow new data to be updated for the people included in your sample.
As you prepare to publish your research results, check one last time that they cannot be used to expose the identity of the people and enterprises included in your data, directly or indirectly. Check that Statistics Finland is named as the source of your research data. Also remember to send a copy or a link to your published research reports to firstname.lastname@example.org as required in our user licence terms.
Good luck with your future projects! You are welcome to leave feedback for the Research Services (email@example.com) about anything you would like regarding our data, remote access system and other services.
The checklist for remote access users of research data covers key principles regarding data use and user applications. You should remember them when you prepare your own study. When you have absorbed the instructions in this guide, you will have a good head start in your research.
Did you find everything? Would you have wanted something more? Leave us your feedback about the guide so we can improve it! Send your comments by email to firstname.lastname@example.org.