Tuesday, May 31, 2016

Facilitating the use of Wikidata in Wikimedia projects with a user-centered design approach

A lot of students learn the ropes as an intern working on Wikidata. They do their thing, write a thesis and the good news is that their work is appreciated and used. 

Charlene Kritschmar is the latest to write a thesis and it is on an approach to the use of Wikidata in other Wikimedia projects. The central question is how to manage Wikidata data from a Wikipedia. I like what I have read, there are a few things that need to be considered.

Wikidata notability and the thesis have it that "everything that has an entry in Wikipedia can also be an entry in Wikidata". Technically it is the other way around; every item in Wikidata may have an article in a Wikipedia. The difference is profound because a new article in a Wikipedia may have a pre-existing item on the topic in Wikidata.  It follows that statements may already be present. It makes it less cumbersome to write an article and fill an info box with data. 

Wikidata includes 17,677,925 items and the biggest Wikipedia knows about 5,164,030 articles. This makes Wikipedia centric thinking problematic. What any Wikipedia offers Wikidata is a big community who may improve the data quality on Wikidata and by inference improve the quality of all Wikipedias. The flip side of this coin is that there is no Wikipedia leading on what Wikidata has to say on any given subject.

Sunday, May 29, 2016

#Facebook - Nataliya Kobrynska

For whatever reason a fellow Wikimedian elected to give attention to Mrs Kobrynska. He posted about her on Facebook and when I have the time I may add some statements in Wikidata. It was easy enough to improve the quality of the data and I read in the article that she was the daughter of a parliamentarian a Mr Ivan Ozarkevych.

I could not add him as her father because there was no label for him in English. I mentioned on Facebook that I could not find him and, a label and the relation was added. The father had an article on the Polish Wikipedia, it referred to him as a member of the Galician parliament and it was easy and obvious to add the fact that he was a parliamentarian and a politician. Not only for him but also for his fellow parliamentarians.

#Wikidata - Debunking #controversy in #science

I really wonder what an organisation would do that hands out "one of the scientific world's most respected environmental prizes" does when one of its luminaries becomes controversial.

The Volvo Environmenta Prize was awarded to Mr Ray Hilborn in 2006. Mr Hilborn and his science has become controversial because of the conflict of interest he has with the fishing industry. Greenpeace has documented this quite publicly.

Together with Mr Hilborn, Mr Pauly and Mr Walters were awarded the Volvo Prize. They are all known for their work on fisheries. The obvious question is now whether the work of Mr Paul and Mr Walters are tainted in the same way. This is one reason why controversies like this are so important.

When a specific line of work in science has been debunked, it becomes important to undo the damage and reevaluate the work in a field. One of the more obvious ways to make this point is for the Beijer Institute to address this issue in one way or another. When the science of Mr Hilborn is unsound, it follows that his work does not point to a sustainable future and that he does not deserve the Volvo Environment Prize.

#Wikipedia #citations - LibraryBase

The general idea is that if Wikipedia articles are to be believed, citations ensure the quality of the statements made. The quality of the sources is therefore important. When a specific publication has a problem, a problem like reproducibility or a known conflict of interest of the author or the organisation he stands for, it follows that the publication as a source becomes problematic.

The problem with sources in Wikipedia is that like all the rest they are buried in the articles. As sources are typically known in the text through templates, it becomes possible to harvest all this and put it in a database. When things get into a database it becomes possible to analyse the data and find the authors that are problematic, refer back to the articles and remedy the inherent conflict in the article.

Take Mr Ray Hilborn for instance. He is under attack for his conflict of interest by Greenpeace. Consequently his POV needs to be collaborated by independent sources and all his science is suspect. It is wonderful to harvest all the data about sources from all the Wikipedias but there is no point to it when it does not lead to something useful.

There is a lot of money going around to confuse issues and serve specific interests. When sources are available to us all, it becomes possible to mark publications for the quality that they have. When sources are not reproducible, it follows that you can not build arguments on top of those. It then becomes possible to consider basic stuff and no longer confuse a Neutral Point of View with what is patently false.

Wednesday, May 25, 2016

#Wikidata - Kerala MLA constituencies

Kerala is one of the states of India and like all the others has its own legislative assembly. Like in Great Britain politicians are elected from constituencies. There are many as you can see on the map.

When there are elections, things change. New people become a representative, some remain a representative and others no longer have relevance in that way. At Wikidata, the current list of people who are "Member of the Kerala Legislative Assembly" is a bit of a mess. There are many items without a name in English, there are people who are only known in English and probably there are a lot of doubles. 

There are even representatives who are known to have an article on the English Wikipedia but do not (yet) have an item. This is all because of this big push to write articles on Indian representatives.

As more work is done for this big push to get the data complete, the data will become more informative. What we hope to achieve is:
  • associate MLA's with constituencies
  • have labels in both English and Malayalam for all of them
  • merge all the possible duplicates
Obviously there is more that might be done. We could add the dates when people became a MLA. This will allow us to create queries that shows who was a MLA at what time. When all this is done for Kerala, there are 28 other Indian states and there are many other countries that could do with a little bit of TLC.

Wednesday, May 04, 2016

#Wikimedia - [[citation needed]]

Our articles in any #Wikipedia can be trusted when an effort has gone into providing sources. Sources or citations are very much needed because help us distinguish fact from fiction. Finding sources exposes an origin and it helps us debunk fiction. The result of this continued effort is content that can be trusted as a sincere attempt to achieve a neutral point of view.

There are very practical problems. Sources are not always easy to find and they do not exist in every language. Sources are often behind a "pay wall” making access to the body of knowledge is very much restricted. Sources, particularly sources on the web do not exist forever. The consequence is that sources are problematic and, not everybody is equally able to help us with sources for the content we have.

When we are to improve the current, unsatisfactory situation we have to address multiple problems.
  • Once sources are lost we rely on the internet archive for an historic view. It has policies that allow for the removal of content and this is often the content that is controversial and removal is often intended to rewrite history. What to do?
  • Access to restricted sources is provided to the privileged few who have access to libraries. The WMF has a program that enables some of our editors access to a few pay-walled sources.
  • When this proves insufficient, it is great to know that  Sci-hub among others provides “illegal” access to any and all sources.
Open access to sources is very much what we as a community care for. One of our own died in the struggle for this access so I do not think we should be deferential to an industry that is despicable. We should teach people how to find sources and ignore licensing as much as possible.

#Wikipedia / #Commons - Brigadeer General Loree K. Sutton

Mrs Sutton is psychiatrist who is a specialist on PTSD. When you read her CV, it is impressive. She no longer works for the US Army, she works for the City of New York.

When you read the article on Wikipedia, you find her picture. It is marked as Public Domain and it is not on Commons. Given that Wikidata is working towards the point where copyright and license information one can only hope that images like this can be easily shared based on the license.

When Commons started, it was intended as a repository that prevented the same file to be uploaded to all the Wikipedias. As such it served its purpose remarkably well. With Wikidata it becomes trivial to share images like the one of Mrs Sutton.

I fear that for some this reads as frightening. It undermines the one thing they love. It actually does not need remove the need for Commons as a platform. Quite the opposite; it will bring new tools to finally leverage all the data on images. It may bring this image of Mrs Sutton to Wikidata for starters.