This task's evaluation is hosted on TIRA. Participants will install their systems in dedicated TIRA virtual machines, so that their runs can be reproduced and so that they can easily be applied to different data (of same format) in the future.
The following guide is adapted from the corresponding guide of the CoNLL 2018 Shared Task.
Typically, you will train your models on your own hardware. Once ready, you will upload both your parsing system and the models to the VM. It is not forbidden to train the models directly in the VM but note that the resources there are limited.
The VMs contain the task's data at
/media/training-datasets/hyperpartisan-news-detection. First try running your system on these datasets through connecting by SSH or RDP to your VM (you can find the host and ports in the web interface, same login as to your VM). If you can not connect to your virtual machine, please make sure it is powered on: you can check and power on your machine using the web interface.
Next register the shell command to run your system in the web interface, and run it. Note that your VM will not be accessible while your system is running – it will be “sandboxed”, detached from the internet, and after the run the state of the VM before the run will be restored. Your run can then be reviewed and evaluated by the organizers. You can inspect runs on training and validation data yourself.
Note that your system is expected to read the paths to the input and output folders from the command line. When you register the command to run your system, put variables in positions where you expect to see these paths. Thus if your system expects to get the options
-o, followed by input and output path respectively, the command you register may look like this:
/home/my-user-name/my-software/run.sh -i $inputDataset -o $outputDir
The actually executed command will then look something like this:
/home/my-user-name/my-software/run.sh \ -i /media/training-datasets/hyperpartisan-news-detection/pan19-hyperpartisan-news-detection-by-publisher-validation-dataset-2018-11-22 \ -o /tmp/my-user-name/2018-11-22-10-11-19/output
See the links below for more details.
When you have tested your system on the validation or training data and everything works fine, run it on the test data (
pan19-hyperpartisan-news-detection-by-publisher-test-dataset-2018-12-12). This is only possible through the web interface. Once the run of your system completes, please also run the evaluator on the output of your system. These are two separate actions and both should be invoked through the web interface of TIRA. You don’t have to install the evaluator in your VM. It is already prepared in TIRA. You should see it in the web interface, under your software, labeled “Evaluator”. Before clicking the “Run” button, you will use a drop-down menu to select the “Input run”, i.e. one of the completed runs of your system. The output files from the selected run will be evaluated.
You will see neither the files your system outputs, nor your STDOUT or STDERR. In the evaluator run you will see STDERR, which will tell you if one or more of your output files is not valid. If you think something went wrong with your run, send us an e-mail. We can unblind your STDOUT and STDERR on demand, after we check that you did not leak the test data in the output.
You can register more than one system (“software”) per virtual machine using the web interface. TIRA gives systems automatic names “Software 1”, “Software 2” etc. You can perform several runs per system. We will officially score evaluator runs for your softwares like this: the latest run for “Software 1” on both test datasets before the early bird deadline (13 Dec 2018) and the latest run for both “Software 1” and “Software 2” on both test datasets before the final deadline (23 Jan 2019, both deadlines are “Anywhere on Earth”). While we may be able to include some late arrivals in the final ranking, we do not guarantee it. Note that
pan19-hyperpartisan-news-detection-by-article-test-dataset-2018-12-07 is the dataset used to determine the ranking for the task and the winner of the grand prize, but results achieved for the other test dataset are still “official” and will be discussed in our task overview paper.
If you want us to score other runs, please write us a mail. If you want to try out more, we will also publish scores of more runs on your request after the deadlines. You can use those numbers in your system description paper, but they do not count for the grand prize.
If your system requires more resources than available in the default VM (memory, disk space, CPUs), please estimate what you need and discuss it with Johannes Kiesel. We can usually increase the resources, but note that accommodating such requests takes time, so act early. The sooner you complete at least one successful run, the safer you are.
Access to the Virtual Machines and Intellectual Property Rights
The VMs are distributed across different hosts. The only people who have access to the participant VMs are TIRA admins (a very small group of people operating the service) and the organizers of the task.
We can guarantee that we will never deliberately share your VM or its contents, nor use it for anything else but for the purpose of evaluating your software as part of the shared task, unless you give us written permission. We ask that you give the task's organizers and the TIRA operators usage rights for your software for this purpose only.
However, we cannot guarantee that no content of the VM will leak accidentally and we shall not be held liable for damages caused by such leaks. In particular, we cannot vouch that the software packages and operating systems TIRA depends on are free of zero day exploits.
The performance results and output of your software will become part of public record, for which we ask for indefinite, irrevocable, and transferable rights to publish them within any scientific publication as well as on the TIRA web service and on the shared task website.
By deploying your system in your VM and running it through the TIRA interface you express your consent to these conditions and give us the rights as described above.
We understand that for industry-based participants, protecting their software is an important matter. If you want to learn more about the TIRA procedures and about your options, please get in touch with the TIRA administrators (tira at webis dot de). TIRA has been used by a number of companies so far, some small ones but also some big ones. The involvement of industry in scientific events should not be foreclosed. If we are to improve reproducibility at large, however, there is no way around venturing more openness on either side.