PAN Corpora
Name Publisher/Creator Year Size [bytes] Size Units Default Task Access
C50-Attribution Author Identification [joint]
C10-Attribution Author Identification [joint]
PAN19-Attribution 2019 13 MB Author Identification [zenodo]
PAN18-Attribution 2018 4 MB Author Identification [training][evaluation]
PAN12-Attribution 2012 9 MB Author Identification [training][test][joint]
PAN11-Attribution 2011 3 MB Author Identification [training][test][joint]
PAN15-Verification 2015 3 MB Author Identification [training][test]
PAN14-Verification 2014 9 MB Author Identification [training][test]
PAN13-Verification 2013 1 MB Author Identification [training][test]
PAN17-Clustering 2017 1 MB Author Identification [training][test]
PAN16-Clustering 2016 3 MB Author Identification [training][test]
PAN16-Author-Masking 2016 Author Obfuscation [training]
PAN13-Author-Profiling 2013 713 MB Author Profiling [training][test]
PAN14-Author-Profiling 2014 205 MB Author Profiling [training]
PAN15-Author-Profiling 2015 2 MB Author Profiling [training]
PAN17-Author-Profiling 2017 Author Profiling [training][test]
PAN18-Author-Profiling 2018 Author Profiling [training][test]
PAN19-Bots-and-Gender-Profiling 2019 38 MB Author Profiling [zenodo]
PAN19-Celebrity-Profiling 2019 3.2 GB Author Profiling [zenodo]
PAN17-Style-Change-Detection 2017 8 MB Multi-Author Analysis [training][test]
PAN18-Style-Change-Detection 2018 8 MB Multi-Author Analysis [training][validation][test]
PAN19-Style-Change-Detection 2019 10 MB Multi-Author Analysis [zenodo]
PAN11-Wikipedia-Vandalism 2011 Credibility Analysis [training][test]
PAN10-Wikipedia-Vandalism 2010 Credibility Analysis [training][test]
PAN12-Wikipedia-Quality-Flaw-Prediction 2012 Credibility Analysis [training][test]
PAN-SemEval-Hyperpartisan-News-Detection-19 2018 1 GB 751K Credibility Analysis [zenodo]
PAN12-Deception-Detection 2012 Deception Detection [training][test]
PAN12-Source-Retrieval 2012 Originality [training][test]
PAN13-Source-Retrieval 2013 Originality [training][test]
PAN14-Source-Retrieval 2014 Originality [training]
PAN15-Source-Retrieval 2015 Originality [training]
PAN12-Text-Alignment 2012 Originality [training][test]
PAN13-Text-Alignment 2013 Originality [training] [test]
PAN14-Text-Alignment 2014 Originality [training][test][supplemental test]
Alvi15-Text-Alignment-en-fa 2015 Originality [training]
Cheema15-Text-Alignment-en 2015 Originality [training]
Hanfi15-Text-Alignment-en-ur 2015 Originality [training]
Khoshnavataher15-Text-Alignment-fa 2015 Originality [training]
Kong15-Text-Alignment-zh 2015 Originality [training]
Mohtaij15-Text-Alignment-en 2015 Originality [training]
Palkovskii15-Text-Alignment-en 2015 Originality [training]
PAN09-Intrinsic-Plagiarism-Detection 2009 Originality [training][test]
PAN11-Intrinsic-Plagiarism-Detection 2011 Originality [training][test]
PAN09-External-Plagiarism-Detection 2009 Originality [training][test]
PAN10-External-Plagiarism-Detection 2010 Originality [training][test]
PAN11-External-Plagiarism-Detection 2011 Originality [training][test]