Thursday, December 14, 2017

A purposeful #strategy for #Wikidata

A strategy for Wikidata? Obvious, it is all about having a purpose. It is not about policies, it is not about what we need or expect of others but it is about the purpose you, I and others have for us to collaborate on in an inclusive Wiki and data project.

The implication of making the purposes of our community rule supreme are huge. Purpose like so many other things can be measured. When people have a purpose for Wikidata and actually use it, their need for quality is self evident. They will invest their time and effort in fulfilling their purpose. The one question is how to fit in the many purposes that exist for Wikidata.

Take for instance the objective of Lsjbot for a rich Wikipedia in the Cebuano language. He uses data from an external database to create articles. Data from these articles are imported later through the Cebuano Wikipedia in Wikidata. This is seen by some as controversial because of the need to integrate data that often already exists. The purpose is obvious; rich information in the Cebuano language. The solution is obvious as well; let Lsjbot use the data at Wikidata to generate the information for the Cebuano Wikipedia. GeoNames is happy to collaborate with us on this, so when we care to collaborate and welcome its data at the front door, we can mix'n'match the data into Wikidata, curate the data where necessary and share improved quality widely, not only on the ceb.wp.

The Biodiversity Heritage Library Consortium is working extremely hard to expose their work to the general public. Over a million illustration found their way to Flickr. Fae imported many of these to Commons and most if not all the associated publications can be read on the Internet Archive or on its website. Their content is awesome, check for instance their Twitter account. We can import all the BHL books in Wikidata, we are importing all associated authors using Mix'n'Match. The images are in Commons but how is this brought together? How do we add value for the BHL and as important, for our shared public?

The Internet Archive is a Wikimedia partner. It provides essential services for us with its "Wayback machine". It is how we can still refer to references that used to be online. One other venture of the Internet Archive is its Open Library.  What we already do for the Open Library is linking their authors and by inference books to the libraries of the world through VIAF. We could share this information with the Wikipedias so that its readers may find books they can read. (Talk about sharing the sum of all knowledge).

Both the IA and the BHL want people to read. They (also) provide scientific publications that may be read to prove the points Wikipedia authors make in articles. Both can be big players strengthening the value of citations in WikiCite. At this time its strength is particularly in the biomedical field and it is already attracting bright people to Wikidata. As data from other fields finds its way, people like Egon and Siobhan will find their way. This will make Wikidata even more inclusive.

To make this future work, to become more inclusive, we should trust people more particularly when they indicate why they use Wikidata. The Black Lunch Table is a great example. The description at Wikidata says: "visual artists of the African diaspora initiative that includes Wikipedia editathons and outreach". One way of knowing how effective this initiative is is the history page of its listeria list. It shows a steady growth of information added. When you analyse it further you find artists added and selected for new editathons. Truly a great example of Wikidata having a purpose.

A strategy based on purpose, is a strategy based on trust. Not blind trust, but the kind of trust where it is seen that people are committed to improve both quantity, quality and usefulness of the data they identify with.

Sunday, December 10, 2017

When #Wikidata is good for something

When #Wikidata is good for something, it shines. It does not take much prodding to find people to improve on what it does so well and consequently when Wikidata is useful, quality follows easily.

The promise of  a useful Wikidata was delivered at its start by having it replace the native interwiki links of Wikipedia. Within a month the quality of Wikipedia links had improved dramatically and at this time corner cases are still worked improving quality even more.

The WikiCite project is really important in many respects and it has so much more to offer. It is useful because it brings many initiatives and projects together under one roof. It is why scientific papers are included, including its authors. We find that more and more authors are included as well and they are often linked to the ORCID, VIAF and other external identifiers of this world. This has great value because it allows Wikipedia articles and information maintained elsewhere to be linked. What it can be used for is limitless. End users will find new and interesting ways to use the data and make it into information.

When Wikidata is to be good for Wikimedia projects, this information brought to Wikidata because of WikiCite has great potential. It largely reflects the citations in all the Wikipedias and consequently through linked so external sources we could know what sources are problematic, retracted or bought by interested parties. We could, we don't. When we did, we would provide weight against propaganda and fake news.

The big thing holding us back is trust. Wikipedians need to consider a Wikidata that is not only used for links and that can be trusted for high level maintenance of its citations. Wikidata is to appreciate its use and trust that its information will be used and that this will increase its value and quality. WikiCiters have to understand that Wikidata is not a stamp collection only including publication data. It must include information about retractions, about papers considered problematic for political or scientific reasons (or both).

When Wikidata is to be good for something; we should expand our collaboration with Cochrane, Retraction Watch and organisations like it. There is everything to gain; quality, contributors and relevance.

Saturday, December 09, 2017

#Wikipedia #NPOV - When there is no neutral point of view

Mr Jacobson, a climatologists at Stanford University wrote a paper. Its findings were disputed in another paper. Jacobson maintains that the USA can be served for its energy needs exclusively with green energy. The contrarians have it that there must be a mix of conventional and green energy.

There are several issues with the latter paper; it is a paper supported by the conventional energy industry. The result of the paper are in the best interest of this energy and the paper is considered by many not to be the result of a scientific process. So much so that Jacobson went to court.

There is a big difference with an opinion piece and a scientific paper. The critique of the contrarians is that Mr Jacobson does not consider nuclear, fuel and bio fuel solutions at all. They argue that it could make the transition more difficult or expensive. But that is not the point. The point is that you can and, the point is that green energy is getting cheaper.

When a paper is bought by industry and the premise of the original paper is ignored, it is no longer scientific but becomes an opinion piece. Mr Jacobson is not the first predicting the demise of "big" energy, Greepeace has been doing it for decades..

There is no middle ground. It is why Mr Jacobson is going to court because the paper of the contrarians only serves one purpose; postponing the inevitable. It is not a scientific critique in any acceptable way.

Tuesday, November 28, 2017

#Wikidata - Disambiguating for the Biodiversity Heritage Library

Tatiana Carneiro is an entymologist. Her work is known at the Biodiversity Heritage Library. When you check the "authors page", there are two other identifiers known and, for "Tatiana R. Carneiro" the same two identifiers are shown as well.

When you google for Mrs Carneiro all kinds of information may be found but you do not want to do this for all the 177,271 BHL authors that are waiting in Mix'n'Match. It is no fun and only a few people take up a task like this.

So the question is; how do we make it more rewarding and how do we bring the many Brazilian papers to Wikidata as well. What is it that there is to achieve and how does it benefit all the people reading Wikimedia content.

For readers of our content, there is little merit in the fact that all these authors published papers. Many of them have been published with a DOI and, many of these papers are freely available to read. For them the papers are important. So contrary to a more normal database approach it is not the authors we should concentrate on but it is their publications. In addition to this, the BHL actively promotes the use of illustrations and publish them on Flickr. Thanks to the fine work of people like Fae these illustrations end up on Commons as well. It will be a challenge to link them to all this metadata..

There are millions of illustrations, there are far fewer publications and many authors are known for not one but multiple publications. To complicate it even further, an illustration has an illustrator and many publications are exclusively found in archives. Many publishers are no longer active and all this information is or may be considered relevant.

So what to do; first import all the publications that are freely readable. The publications with a DOI and include the author information as "author name string". When an author is known to Wikidata, we can always add the author information as well. The benefit of this approach? People can read now.

To make it interesting we can run a bot using the APIs of the BHL. We add missing books for authors and add the authors to the books where this information is missing. Running this regularly will make it interesting for anyone interested in the work of the BHL. But most importantly, people can read now.

Sunday, November 26, 2017

#Wikidata - a "sand engine" for the UK?

The Netherlands are largely below sea level and given climate change, keeping our feet dry is not at all obvious. There are many ways to defend a coast line and a "sand engine" or "sand motor" is one.

According to EcoShape, there will be a sand motor protecting the Bacton Gas Terminal and the surrounding Norfolk coast.

This does have an impact on the existing Wikipedia articles about the sand motor. It is no longer about the Netherlands and as there are more coastal areas that need protection, more sand motors are to be expected. For Wikidata, all the sand motors can have their own item. It will become possible to query where they are, where they are planned and what areas are protected in this way. The original sand motor will have its own place in history but it will not be unique. That is good.

#Wikidata - I fucking love #science

When I am on Facebook, information from "I Fucking love Science" is always a nice read. Mrs Andrew received the Stamford Raffles Award. It is why I found her article.

When you read the article, it is heavy on Mrs Andrew's problem of being taking seriously because she is a woman and there is also a lot about the accusations of plagiarism. The problem is that plagiarism and the unlicensed use of intellectual property are quite distinct. Given that IFLS is about reporting on science, there should be no argument; they do not claim the ideas they report on as their own..

It is better to split the information about Mrs Andrew and IFLS; it brings clarity and it invites additional information about the reach of IFLS and the reason why she was awarded the Stamford Raffles Award.

Sunday, November 19, 2017

#Wikidata vs #Wikipedia - Rukmini Maria Callimachi

Mrs Callimachi did not only win the Polk Award, she is both a journalist and a poet and did not only win journalism awards. One of the awards, the Michael Kelly Award is hidden on the Wikipedia article of Michael Kelly

This article is about how Wikidata and English Wikipedia can help each other. The Wikipedia article lists seven awards and this makes it easy to add other award winners for them as well.

Thanks to Magnus' awarder, this is fairly easy but some awards hide out as part of an article and the award has to be added in Wikidata.  It may be one reason why later awards are missing. The religious award she is said to have won, it is a different award with a similar name. The award and the organisation that confers it had to be created.

The point, we can compare data at a Wikipedia with what we have on Wikidata. They should match. When they do not, there is an issue. Copying the data from Wikipedia is easy and it is the obvious thing to do. When Wikipedians decry the quality of Wikidata, they should reflect on why this is the case. When we collaborate, we will slowly but surely improve our quality. In the final analysis our aim is the same; share in the sum of all knowledge.

Saturday, November 18, 2017

#Wikipedia vs #Wikidata - the George Polk Awards

Some Wikipedians consider Wikidata inferior, so much so that they agitate towards a policy that bans Wikidata in "their" Wikipedia. They are welcome to their opinion.

I do bulk imports from Wikipedia and all the time I suffer the consequences. Some three to four percent of their data is wrong for all kinds of reasons, reasons that are manageable with proper tooling.

The George Polk Award is an award for journalism and it got my attention again because the International Consortium of Investigative Journalists received it for their work on the Panama Papers. I noticed that many people listed who had been awarded the Polk Award did not have articles in Wikipedia, that many of the link in the list of award winners pointed to the wrong person and that many award winners did not even have a "red link".

I am in the process of checking all the links and adding the date for the award. I found many issues among them a civil war general and many others false friends. I am adding items for the people who do not have an English article and, I have to check each of them because several do have articles in other languages. It is a lot of work and it is not as useful as it could be because Wikipedia hates Wikidata and we do not collaborate, we do not work together.

There is a Listeria list of winners and slowly but surely it will contains the information that is similar to the English Wikipedia list article. Similar but not the same;
  • the false friends will not be there, 
  • there will be no red or black links
  • people who won the award twice will be missing
Why do this, why spend so much time on one big list? Well, in this day and age of "fake news" we should celebrate journalism but having all this information in Wikidata allows for all kinds of tools as well. We can check for false friends, we can check if the articles on the award winners include the award but also if there are "winners" who are not known in this list and in the source available for the George Polk winners..

I am not a Wikipedian and truthfully I hate the endless and senseless bickering that is going on. So let me work on the data, make it available to tools. Now you Wikipedians, you may choose not to show Wikidata data in your infoboxes but you will not make your errors go away without collaboration. Yes, you can quote a source but when your data is not in line with what the source states, having a source does not do you good, effectively you provide fake information.

My request to the reasonable people at Wikipedia and Wikidata, let us work together and see how we can improve quality. Lets link wiki links (blue, red and black) to Wikidata and improve the quality of what is on offer first.