Jason Chu

Turnitin’s Jason Chu explains why its technology was needed to trawl Wikipedia for copyright infringement, and how its working with schools and colleges

The partnership with the Wikimedia Foundation puts your tech to work in one of the internet’s largest repositories of information—how important was it that greater checks were put in place for copyright infringement in the English Wikipedia?

Verification of content and protection of copyright are increasingly complex issues in a digital world due primarily to the unfathomable amount of content that continues to expand beyond our human ability to manage it. Wikipedia is the world’s largest and comprehensive free encyclopedia. Turnitin is the leading source for helping publishers ensure the originality of published works.

The Wikipedia community places a high premium on what they call ‘verifiability’, which means that content and information in any article must be attributable to a published and reliable source. Turnitin makes it possible to authenticate that content on Wikipedia or in any other digital source on a level of scale beyond what was previously in place, affording Wikipedia editors with greater efficiency in addressing potential copyright concerns.

What does the Wikimedia Foundation want from Turnitin’s technology in terms of results?

One of the challenges in checking for copyright violation is the fact that there are so many other websites that mirror and/or duplicate Wikipedia content, creating a lot of noise and making it difficult to identify original sources. A key criterion for Turnitin was that our technology be effectively able to avoid comparing Wikipedia content to mirror site content.

That said, it’s helpful to mention why Wikipedia approached Turnitin in the first place.

Jake Orlowitz, from the Wikipedia community, approached Turnitin because the existing bots that Wikipedia had been using to check for copyright violations were not providing comprehensive enough content coverage—nor were they doing so at scale. The bots that were previously deployed did not provide the ability to compare content in Wikipedia articles to academic journal and publication content, which is something that Turnitin does really well with coverage of nearly 80 percent of the top 5,000 journals in the world. These were the conditions for Wikipedia using Turnitin.

Insofar as results go, the Turnitin-powered bot, EranBot, was extensively tested first to check all English edits to medical-related Wikipedia articles for the span of over a year. And once this pilot testing was complete, and with Wikipedia community approval, EranBot was deployed for use in checking all English article edits.

Could these robots ever replace human editors?

Technology can never replace humans. This technology is all about complementing human judgement, making it more efficient to sort through and identify trouble spots. EranBot and Wikipedia’s other bots are used to identify those trouble spots: content in articles or edits to articles that may match content in outside, online or publications-based sources.

It’s ultimately up to Wikipedia editors, the human beings in the decision-making process, to review the content match reports from Turnitin and determine whether that content has been appropriately used or whether the use may be a violation of copyright. The technology helps to make the identification of that content more efficient and easier. The technology cannot, however, make judgement calls on the use of that content. Humans make those judgement calls.

A common bugbear of copyright owners is having to point out infringements themselves through takedown requests. How can Turnitin’s technology help to alleviate those burdens?

Turnitin generates reports that clearly highlight content that matches to other sources, whether online or publications-based. More than just pointing out the sources for content matches, Turnitin reports showcase all of the matches to content identified by our algorithm, providing a very comprehensive report that a copyright owner can share with potential violators to inform them and validate that infringement claim.

What about Turnitin’s work with schools and colleges in terms of students’ plagiarism? How common is the problem and how is technology helping to overcome it?

At Turnitin, what we see is that students do not fully understand copyright and the responsible use of source information, including proper citation. Much of how students think about copyright is informed by how they consume and share information online. Through apps and services that foster peer-to-peer communication, the internet facilitates and the web supports connection and sharing, including the notion that information is free and free to share.

This freedom is great for lubricating engagement and social interaction online. It doesn’t work in an academic context. In an academic context, students need to understand that the simple act of sharing information, without attribution and without a critical eye to the information itself, is not ok.

Schools and institutions that use Turnitin are using it in a way that gets students to think about how they incorporate source material in their own work and to critically consider the sources they use.

Turnitin reports are a great way to highlight and provide feedback to students on how they use source information respecting originality and upholding copyright.

How does Turnitin work with schools and colleges to educate their students about plagiarism?

In addition to reports, students can pre-submit their writing for originality checks that will highlight questionable sources before they turn in their final work. Much of the learning process occurs during revision and this is where the greatest impact occurs.

Students learn to respect copyright and how to properly cite content. It all goes back to education and supporting an individual’s ability to make an informed judgement call.

Students have the ability to review their own Turnitin reports prior to submitting their final papers. These reports can then serve as a great resource for students to use to check their own work for improper use of sources, the absence of proper attribution, or poor paraphrasing.

The latest interviews from IPPro The Internet
The latest features from IPPro The Internet
As the UK shifts closer to its eventual departure from the EU, the country’s intellectual property industry assesses its options and looks to avoid a cliff edge. Kate O’Rourke, president of the Chartered Institute of Trademark Attorneys, explains
Vladimir Biriulin of Gorodissky discusses the technical knowledge that the Russian IP Court has developed over its four-year tenure
Join Our Newsletter

Sign up today and never
miss the latest news or an issue again

Subscribe now
With EU copyright reforms coming to a head, Barney Dixon speaks to Raegan MacDonald to see how the landscape has changed in recent months
Le Quang Vinh of Bross & Partners examines the substantive changes to criminal law in Vietnam that promise to rein in counterfeiting and piracy
As EU copyright reform continues, publishers are insisting the press publisher’s right will be good for business and won’t harm consumers. Angela Mills Wade of the European Publishers Council explains
ECTA’s copyright committee was formed in response to the modernisation of the EU’s approach to copyright. Chair Dr Christian Freudenberg tells Mark Dugdale what this has meant in practice
ECTA has ramped up its efforts to ensure that IP rights are heard in Brexit negotiations. But this isn’t all the trademark association has been up to in the past year, as Ruta Olmane explains
William Dyer III and Bea Koempel-Thomas of Lee & Hayes examine TC Heartland v Kraft and the arguments put forward in support of each party
Country profiles
The latest country profiles from IPPro The Internet
While Indian fair use is not explicit, provisions exist for the fair dealing of copyright. Rohit Singh and Tina Canneth of Abu-Ghazeleh Intellectual Property delve deeper
An interpretation of the current events exception in Radosavljević is creative, say BDK Advokati's Bogdan Ivanišević and Marko Popović
IPPro Patents

Visit our sister site
for all the latest IP patents news and analysis

Yu-Li Tsai of Deep & Far examines how damages are calculated in patent infringement litigation
A recent amendment will make costly annulments a thing of the past. Gilberto Sanchez of SPECyF explains
New legislation in Turkey promises a swathe of trademark changes. Dr Cahit Suluk of Cahit Suluk Intellectual Property Law Firm explains
A trademark decision clarified ‘against the public order’ as an absolute ground for refusal. Sár and Partners – Danubia Patent & Law Office reports
Bogdan Ivanišević and Marko Popović of BDK Advokati review the recent squabble about copyright protection for ‘routinely created photos’
Alston & Bird recently expanded with a new office focusing on counselling Chinese companies on US intellectual property law. Yitai Hu explains what patent owners face when working across borders