The partnership with the Wikimedia Foundation puts your tech to work in one of the internet’s largest repositories of information—how important was it that greater checks were put in place for copyright infringement in the English Wikipedia?
Verification of content and protection of copyright are increasingly complex issues in a digital world due primarily to the unfathomable amount of content that continues to expand beyond our human ability to manage it. Wikipedia is the world’s largest and comprehensive free encyclopedia. Turnitin is the leading source for helping publishers ensure the originality of published works.
The Wikipedia community places a high premium on what they call ‘verifiability’, which means that content and information in any article must be attributable to a published and reliable source. Turnitin makes it possible to authenticate that content on Wikipedia or in any other digital source on a level of scale beyond what was previously in place, affording Wikipedia editors with greater efficiency in addressing potential copyright concerns.
What does the Wikimedia Foundation want from Turnitin’s technology in terms of results?
One of the challenges in checking for copyright violation is the fact that there are so many other websites that mirror and/or duplicate Wikipedia content, creating a lot of noise and making it difficult to identify original sources. A key criterion for Turnitin was that our technology be effectively able to avoid comparing Wikipedia content to mirror site content.
That said, it’s helpful to mention why Wikipedia approached Turnitin in the first place.
Jake Orlowitz, from the Wikipedia community, approached Turnitin because the existing bots that Wikipedia had been using to check for copyright violations were not providing comprehensive enough content coverage—nor were they doing so at scale. The bots that were previously deployed did not provide the ability to compare content in Wikipedia articles to academic journal and publication content, which is something that Turnitin does really well with coverage of nearly 80 percent of the top 5,000 journals in the world. These were the conditions for Wikipedia using Turnitin.
Insofar as results go, the Turnitin-powered bot, EranBot, was extensively tested first to check all English edits to medical-related Wikipedia articles for the span of over a year. And once this pilot testing was complete, and with Wikipedia community approval, EranBot was deployed for use in checking all English article edits.
Could these robots ever replace human editors?
Technology can never replace humans. This technology is all about complementing human judgement, making it more efficient to sort through and identify trouble spots. EranBot and Wikipedia’s other bots are used to identify those trouble spots: content in articles or edits to articles that may match content in outside, online or publications-based sources.
It’s ultimately up to Wikipedia editors, the human beings in the decision-making process, to review the content match reports from Turnitin and determine whether that content has been appropriately used or whether the use may be a violation of copyright. The technology helps to make the identification of that content more efficient and easier. The technology cannot, however, make judgement calls on the use of that content. Humans make those judgement calls.
A common bugbear of copyright owners is having to point out infringements themselves through takedown requests. How can Turnitin’s technology help to alleviate those burdens?
Turnitin generates reports that clearly highlight content that matches to other sources, whether online or publications-based. More than just pointing out the sources for content matches, Turnitin reports showcase all of the matches to content identified by our algorithm, providing a very comprehensive report that a copyright owner can share with potential violators to inform them and validate that infringement claim.
What about Turnitin’s work with schools and colleges in terms of students’ plagiarism? How common is the problem and how is technology helping to overcome it?
At Turnitin, what we see is that students do not fully understand copyright and the responsible use of source information, including proper citation. Much of how students think about copyright is informed by how they consume and share information online. Through apps and services that foster peer-to-peer communication, the internet facilitates and the web supports connection and sharing, including the notion that information is free and free to share.
This freedom is great for lubricating engagement and social interaction online. It doesn’t work in an academic context. In an academic context, students need to understand that the simple act of sharing information, without attribution and without a critical eye to the information itself, is not ok.
Schools and institutions that use Turnitin are using it in a way that gets students to think about how they incorporate source material in their own work and to critically consider the sources they use.
Turnitin reports are a great way to highlight and provide feedback to students on how they use source information respecting originality and upholding copyright.
How does Turnitin work with schools and colleges to educate their students about plagiarism?
In addition to reports, students can pre-submit their writing for originality checks that will highlight questionable sources before they turn in their final work. Much of the learning process occurs during revision and this is where the greatest impact occurs.
Students learn to respect copyright and how to properly cite content. It all goes back to education and supporting an individual’s ability to make an informed judgement call.
Students have the ability to review their own Turnitin reports prior to submitting their final papers. These reports can then serve as a great resource for students to use to check their own work for improper use of sources, the absence of proper attribution, or poor paraphrasing.