Recently I was asked to assist in a text attribution problem: the illegitimate reuse of text (and images) from the web pages of a small business. When the offender was approached about committing possible plagiarism their response was ?prove it?. This talk will describe how I approached the problem of proving ownership and the challenges it entailed. I will describe the experiences gained from working on the EPSRC-funded Measuring Text Reuse (METER) project with the UK Press Association, along with the current text attribution literature, that informed my textual analysis. I will also demonstrate how freely available resources can be used to tackle this kind of problem. In the end was I successful: did the offender admit their guilt? You?ll have to attend the talk to find out!
PAN at CLEF 2014
Shared Tasks
Important Dates
- March 1, 2014: Early bird software submission
- May 1, 2014: Final software submission
- May 24, 2014: TIRA evaluation phase deadline
- June 14, 2014: Paper submission: [template] [guidelines] [submission]
- July 26, 2014: Early bird conference registration
- September 14-18, 2014: Conference
The timezone of all deadlines is Anywhere on Earth.
Keynotes
Personality recognition from text consists in the automatic classification of authors' personality traits from pieces of text they wrote. Classifier's predictions can be compared against gold standard labels, obtained by means of personality assessments like the Big5 personality test. Until recently, the extraction of personality types was limited to blogs and offline texts, while in recent years there is a strong interest in the scientific community about the extraction of personality from various sources, such as online social networks, speech and video. Current approaches to Personality Recognition are based on supervised learning, but this has several limitations, for example the cost of data annotation, the lack of domain adaptability and multilinguality. We present an unsupervised method for personality recognition from text and some of its applications in Social network analysis as well as in other NLP tasks.
Program
PAN's program is part of the CLEF conference program.
September 15 | |
|
Conference papers: Session 1 |
15 min. talk | Supporting More-Like-This Information Needs: Finding Similar Web Content in Different
Scenarios Matthias Hagen and Christiane Glimm |
12:30-13:30 | Lunch |
|
Conference papers: Session 2 |
15 min. talk | Discovering Similar Passages Within Large Text Documents Demetrios Glinos |
15:00-15:30 | Break |
|
Lab Overviews |
15 min. talk | Improving the Reproducibility of PAN's Shared Tasks: Plagiarism Detection, Author
Identification, and Author Profiling Martin Potthast, Tim Gollub, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, and Benno Stein |
18:00-19:30 | Joint Poster Session + Welcome Reception |
A Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014 Miguel A. Sanchez-Perez, Grigori Sidorov, Alexander Gelbukh |
|
Heterogeneous Queries for Synoptic and Phrasal Search Šimon Suchomel and Michal Brandejs |
|
Using Intra-Profile Information for Author Profiling A. Pastor López-Monroy, Manuel Montes-y-Gómez, Hugo Jair Escalante, and Luis Villaseñor-Pineda |
|
A Simple Approach to Author Profiling in MapReduce Suraj Maharjan, Prasha Shrestha, and Thamar Solorio |
|
A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF) Mahmoud Khonji and Youssef Iraqi |
|
September 16 | |
Keynote & Plagiarism Detection, Chair: Benno Stein | |
10:30-11:30 | Proving Ownership: The Case of "Wag in a Bag" Paul Clough |
11:30-11:50 | Overview of the 6th International Competition on Plagiarism Detection Martin Potthast, Matthias Hagen, Anna Beyer, Matthias Busse, Martin Tippmann, Paolo Rosso, and Benno Stein |
11:50-12:10 | A Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014 Miguel A. Sanchez-Perez, Grigori Sidorov, Alexander Gelbukh |
12:10-12:30 | Heterogeneous Queries for Synoptic and Phrasal Search Šimon Suchomel and Michal Brandejs |
12:30-13:30 | Lunch |
Keynote & Author Profiling, Chair: Paolo Rosso | |
13:30-14:30 | Unsupervised Personality Recognition from Text: Possible Applications Fabio Celli |
14:30-14:50 | Overview of the 2nd Author Profiling Task at PAN 2014 Francisco Rangel, Paolo Rosso, Irina Chugur, Martin Potthast, Martin Trenkmann, Benno Stein, Ben Verhoeven, Walter Daelemans |
14:50-15:10 | Using Intra-Profile Information for Author Profiling A. Pastor López-Monroy, Manuel Montes-y-Gómez, Hugo Jair Escalante, and Luis Villaseñor-Pineda |
15:10-15:30 | A Simple Approach to Author Profiling in MapReduce Suraj Maharjan, Prasha Shrestha, and Thamar Solorio |
15:30 | Award of appreciation for the overall best performing author profiling approach, sponsored by Corex Atribus, Spain. |
15:30-16:00 | Break |
Author Identification, Chair: Francisco Rangel | |
16:00-16:20 | Overview of the Author Identification Task at PAN 2014 Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Martin Potthast, Benno Stein, Patrick Juola, Miguel A. Sanchez-Perez, and Alberto Barrón-Cedeño |
16:20-16:40 | A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF) Mahmoud Khonji and Youssef Iraqi |
16:40-17:00 | Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm Erwan Moreau, Arun Jayapal, and Carl Vogel |
17:00-18:00 | Discussion |
18:00-19:30 | Poster Session |
Expanded N-Grams for Semantic Text Alignment Samira Abnar, Mostafa Dehghani, Hamed Zamani, and Azadeh Shakery |
|
Evaluating Robustness for 'IPCRESS': Surrey's Text Alignment for Plagiarism Detection Lee Gillam and Scott Notley |
|
Hashing and Merging Heuristics for Text Reuse Detection Faisal Alvi, Mark Stevenson, and Paul Clough |
|
A Hybrid Architecture for Plagiarism Detection Demetrios Glinos |
|
Developing High-Resolution Universal Multi-Type N-Gram Plagiarism Detector Yurii Palkovskii and Alexei Belov |
|
Plagiarism Alignment Detection by Merging Context Seeds Philipp Gross and Pashutan Modaresi |
|
A Winning Approach to Text Alignment for Text Reuse Detection at PAN 2014 Miguel A. Sanchez-Perez, Grigori Sidorov, Alexander Gelbukh |
|
Machine Translation Evaluation Metric for Text Alignment Prasha Shrestha, Suraj Maharjan, and Thamar Solorio |
|
Heterogeneous Queries for Synoptic and Phrasal Search Šimon Suchomel and Michal Brandejs |
|
VEBAV - A Simple, Scalable and Fast Authorship Verification Scheme Oren Halvani and Martin Steinebach |
|
A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF) Mahmoud Khonji and Youssef Iraqi |
|
A Single Author Style Representation for the Author Verification Task Cristhian Mayor, Josue Gutierrez, Angel Toledo, Rodrigo Martinez, Paola Ledesma, Gibran Fuentes, and Ivan Meza |
|
Author Verification: Exploring a Large set of Parameters using a Genetic Algorithm Erwan Moreau, Arun Jayapal, and Carl Vogel |
|
A Language Independent Author Verifier Using Fuzzy C-Means Clustering Pashutan Modaresi and Philipp Gross |
|
A Trinity of Trials: Surrey's 2014 Attempts at Author Verification Anna Vartapetiance and Lee Gillam |
|
Using Intra-Profile Information for Author Profiling A. Pastor López-Monroy, Manuel Montes-y-Gómez, Hugo Jair Escalante, and Luis Villaseñor-Pineda |
|
Age and Gender Identification in Social Media James Marquardt, Golnoosh Farnadi, Gayathri Vasudevan, Marie-Francine Moens, Sergio Davalos, Ankur Teredesai, Martine De Cock |
|
A Simple Approach to Author Profiling in MapReduce Suraj Maharjan, Prasha Shrestha, and Thamar Solorio |
|
DAEDALUS at PAN 2014: Guessing Tweet Author's Gender and Age Julio Villena-Román and José Carlos González-Cristóbal |
|
19:30 | Social Dinner: Cutlers Hall |