Tuesday, November 28, 2017

#Wikidata - Disambiguating for the Biodiversity Heritage Library

Tatiana Carneiro is an entymologist. Her work is known at the Biodiversity Heritage Library. When you check the "authors page", there are two other identifiers known and, for "Tatiana R. Carneiro" the same two identifiers are shown as well.

When you google for Mrs Carneiro all kinds of information may be found but you do not want to do this for all the 177,271 BHL authors that are waiting in Mix'n'Match. It is no fun and only a few people take up a task like this.

So the question is; how do we make it more rewarding and how do we bring the many Brazilian papers to Wikidata as well. What is it that there is to achieve and how does it benefit all the people reading Wikimedia content.

For readers of our content, there is little merit in the fact that all these authors published papers. Many of them have been published with a DOI and, many of these papers are freely available to read. For them the papers are important. So contrary to a more normal database approach it is not the authors we should concentrate on but it is their publications. In addition to this, the BHL actively promotes the use of illustrations and publish them on Flickr. Thanks to the fine work of people like Fae these illustrations end up on Commons as well. It will be a challenge to link them to all this metadata..

There are millions of illustrations, there are far fewer publications and many authors are known for not one but multiple publications. To complicate it even further, an illustration has an illustrator and many publications are exclusively found in archives. Many publishers are no longer active and all this information is or may be considered relevant.

So what to do; first import all the publications that are freely readable. The publications with a DOI and include the author information as "author name string". When an author is known to Wikidata, we can always add the author information as well. The benefit of this approach? People can read now.

To make it interesting we can run a bot using the APIs of the BHL. We add missing books for authors and add the authors to the books where this information is missing. Running this regularly will make it interesting for anyone interested in the work of the BHL. But most importantly, people can read now.

Sunday, November 26, 2017

#Wikidata - a "sand engine" for the UK?

The Netherlands are largely below sea level and given climate change, keeping our feet dry is not at all obvious. There are many ways to defend a coast line and a "sand engine" or "sand motor" is one.

According to EcoShape, there will be a sand motor protecting the Bacton Gas Terminal and the surrounding Norfolk coast.

This does have an impact on the existing Wikipedia articles about the sand motor. It is no longer about the Netherlands and as there are more coastal areas that need protection, more sand motors are to be expected. For Wikidata, all the sand motors can have their own item. It will become possible to query where they are, where they are planned and what areas are protected in this way. The original sand motor will have its own place in history but it will not be unique. That is good.

#Wikidata - I fucking love #science

When I am on Facebook, information from "I Fucking love Science" is always a nice read. Mrs Andrew received the Stamford Raffles Award. It is why I found her article.

When you read the article, it is heavy on Mrs Andrew's problem of being taking seriously because she is a woman and there is also a lot about the accusations of plagiarism. The problem is that plagiarism and the unlicensed use of intellectual property are quite distinct. Given that IFLS is about reporting on science, there should be no argument; they do not claim the ideas they report on as their own..

It is better to split the information about Mrs Andrew and IFLS; it brings clarity and it invites additional information about the reach of IFLS and the reason why she was awarded the Stamford Raffles Award.

Sunday, November 19, 2017

#Wikidata vs #Wikipedia - Rukmini Maria Callimachi

Mrs Callimachi did not only win the Polk Award, she is both a journalist and a poet and did not only win journalism awards. One of the awards, the Michael Kelly Award is hidden on the Wikipedia article of Michael Kelly

This article is about how Wikidata and English Wikipedia can help each other. The Wikipedia article lists seven awards and this makes it easy to add other award winners for them as well.

Thanks to Magnus' awarder, this is fairly easy but some awards hide out as part of an article and the award has to be added in Wikidata.  It may be one reason why later awards are missing. The religious award she is said to have won, it is a different award with a similar name. The award and the organisation that confers it had to be created.

The point, we can compare data at a Wikipedia with what we have on Wikidata. They should match. When they do not, there is an issue. Copying the data from Wikipedia is easy and it is the obvious thing to do. When Wikipedians decry the quality of Wikidata, they should reflect on why this is the case. When we collaborate, we will slowly but surely improve our quality. In the final analysis our aim is the same; share in the sum of all knowledge.

Saturday, November 18, 2017

#Wikipedia vs #Wikidata - the George Polk Awards

Some Wikipedians consider Wikidata inferior, so much so that they agitate towards a policy that bans Wikidata in "their" Wikipedia. They are welcome to their opinion.

I do bulk imports from Wikipedia and all the time I suffer the consequences. Some three to four percent of their data is wrong for all kinds of reasons, reasons that are manageable with proper tooling.

The George Polk Award is an award for journalism and it got my attention again because the International Consortium of Investigative Journalists received it for their work on the Panama Papers. I noticed that many people listed who had been awarded the Polk Award did not have articles in Wikipedia, that many of the link in the list of award winners pointed to the wrong person and that many award winners did not even have a "red link".

I am in the process of checking all the links and adding the date for the award. I found many issues among them a civil war general and many others false friends. I am adding items for the people who do not have an English article and, I have to check each of them because several do have articles in other languages. It is a lot of work and it is not as useful as it could be because Wikipedia hates Wikidata and we do not collaborate, we do not work together.

There is a Listeria list of winners and slowly but surely it will contains the information that is similar to the English Wikipedia list article. Similar but not the same;
  • the false friends will not be there, 
  • there will be no red or black links
  • people who won the award twice will be missing
Why do this, why spend so much time on one big list? Well, in this day and age of "fake news" we should celebrate journalism but having all this information in Wikidata allows for all kinds of tools as well. We can check for false friends, we can check if the articles on the award winners include the award but also if there are "winners" who are not known in this list and in the source available for the George Polk winners..

I am not a Wikipedian and truthfully I hate the endless and senseless bickering that is going on. So let me work on the data, make it available to tools. Now you Wikipedians, you may choose not to show Wikidata data in your infoboxes but you will not make your errors go away without collaboration. Yes, you can quote a source but when your data is not in line with what the source states, having a source does not do you good, effectively you provide fake information.

My request to the reasonable people at Wikipedia and Wikidata, let us work together and see how we can improve quality. Lets link wiki links (blue, red and black) to Wikidata and improve the quality of what is on offer first.

Thursday, November 16, 2017

#Wikidata - women in red - May Wright Sewall

On Twitter, it was mentioned that archival material of Mrs May Wright Sewall was being worked on. When you read the Wikipedia article, it becomes all too obvious how notable she was. She founded multiple organisations and was known for her suffragist ideas.

The article introduces these organisatons and consequently to indicate the relations, new items have to be created in Wikidata. I only did two and I added her husbands, men that supported her in her undertakings.

By adding these new organisations, it becomes possible to link more people to them. They thereby gain notability and it becomes more likely that at some stage they will get their article as well. The least new people and organisations added in Wikidata do is complete the tapestry of information of an age gone by.

Wednesday, November 15, 2017

#Wikipedia - #Retraction exposing big issues in #science

When a scientific paper is published, it is read and cited by other scientists to further on science. It is read and cited by Wikimedians to write articles and share the sum of all knowledge. The Wikicite project provides better tooling for using these papers as a source in Wikipedia articles, it is one of the more relevant developments in combatting fake news in Wikipedia.

However.. there is an issue with a substantial number of papers; they were retracted. There are all kinds of reasons possible but the bottom line is; they are not to be used as a source in Wikipedia because its findings are false.

The challenge: what papers are retracted, how are retractions and the reasons for retractions modelled and how will we find these papers in the Wikipedia sources. Knowing retractions and acting on them will be a fine art; one publisher in South Africa for instance was pressed to retract a book exposing the president. There will be so many issues exposed once retractions become part of the Wikipedia work flow. Failing to do so will be the worst we can do. We will not be sharing the sum of all knowledge, we will be sharing the sum of what we are told.

Thursday, November 09, 2017

Judith Butler in #Brazil - a reaction in the #Wiki way

When the news has it that an effigy is burned of Mrs Judith Butler in Brazil, it is time to give some attention to Mrs Butler. There is information about her, papers she published and one way of adding to the relevance of Mr Butler is by increasing the people she is connected to.

In 2012 she was awarded the Lyssenko award. Adding that date and the other award winners works in two ways; Mrs Butler is better connected but the other award winners are better connected as well.

There is an article for Mrs Butler in English Wikipedia but given that it is a French think tank who conferred this award, chances are that not everyone on this list has an English article. There are projects that suggest articles to write.. Adding awards in this way may feed those projects. I hope so. For me that would be the best outcome that could be achieved.

#Wikipedia - Ischia International Journalism Award & the Polk Award

When people win awards, they often win multiple awards. Harrison Salisbury won several awards not only the Polk Award. The Ischia award did not have a date associated with it. I used Awarder and the data from the Italian Wikipedia because that was most convenient.

There was no article for Mr Salisbury in Italian and consequently there was no date associated with him. Mr Salisbury is represented with a red link. It indicated 1990 and it was an easy manual edit.

As you can imagine, that red link could link to the information about Mr Salisbury on Wikidata. Showing this information to those who are interested in writing a Wikipedia article in Italian does provide pertinent information, information that should coincide with the new article. By comparing the information in Wikidata and in existing Wikipedia articles you know that the article is likely to be correct.

Wednesday, November 08, 2017

#Wikidata as a Wiki versus the data consumers’ perspective

Wikidata is a Wiki. It follows that many people with many agenda's add data to Wikidata. It is a continuous process and as is usual in a Wiki, all contributions that fit the notability requirements of the project are welcome.

The consumers' perspective seen from a Wiki point of view is a bit awkward. There is nothing but active contributors that work towards any of the quality considerations. Even when there is a reasonable quality for some, it may not be enough for others.

Both Wikipedia and Wikidata are Wikis. Both have issues from a consumers' perspective. They are already explicitly integrated through the interwiki links and implicitly through the Wiki links. One of Magnus's tools makes this visible.

When you then consider George Polk and the George Polk Award it becomes obvious that Wikis have an issue from a data consumer's perspective. In some Wikipedia articles the two are conflated. In others there is a separate list of award winners. Many of the award winners do not have an article and some of the award winners refer to the wrong person. Wikidata could do with more data; the data was imported from Wikipedia and several of the wrong persons are still wrong in Wikidata.

Both Wikipedia and Wikidata consume each others data. Both are Wikis. There is no superiority in either project but they could compare their data and curate the differences.

Tuesday, November 07, 2017

#Wikipedia; Héctor Rondón did not win the #Polk Award

This is Héctor Rondón, he pitches for the Cubs. He did not win the George Polk awardHéctor Rondón Lovera did.

This is a common mistake, it happens all the time and it is where Wikidata may make a positive difference to Wikipedia.. It just requires a different mindset to see why this is the right solution at this time. There are some loud Wikipedians that abhor Wikidata. This is an easy and obvious method that will improve Wikipedia and there is no sane argument why this would not work.

These Wikipedians do not even have to notice that this is done; we can hide it from them and still do a world of good. Not just for English Wikipedia but for all Wikipedias.. Ehm, for the readers of all Wikipedias.

Sunday, November 05, 2017

#Wikidata - There is no such thing as a free lunch

Mrs Adriane Fugh-Berman wrote a paper called "Why lunch matters: Assessing physicians' perceptions about industry relationships". There is no such thing as a free lunch and arguably this is exactly what Wikidata is offering to the bio-medical industry.

All the bio-medical papers find their home in Wikidata and there is no mechanism, there is nothing to indicate the many erroneous papers, there is nothing to indicate that specific substances have been banned from use as a medical substance. When Wikipedia is to use Wikidata for information it will be so bad.

Mr Martin Keller is a psychatrist whose reputation was for sale. "His" paper Efficacy of paroxetine in the treatment of adolescent major depression: a randomized, controlled trial has been thoroughly debunked.

At Wikidata there seems to be the notion that facts like this are an affront to its neutrality. It is why there is no mention on the item for Mr Keller; "significant event" "ghostwriting author" was removed.

The problem is that without sufficient debunking potential for ghostwriting authors, their products and their ill effect, there is no possibility to establish the veracity of the bio-medical facts that have been imported in Wikidata. It is vital to the integrity of the Wikidata project that the Mr Kellers of this world are seen for what they are: frauds.

#Wikimedia - I endorse having a #strategy as it is good to have one

Having a strategy is great. There are objectives and there is an idea how to get there. As the Wikimedia Foundation formulates its strategy, it is complicated. Complicated by necessity because it involves so many interests, people who invested so much of themselves in their project(s), people who speak so many different languages, languages that define them, people with different backgrounds because they define them as well. The strategy must be complicated because it aims to reconcile all these people and the organisations that represent them.

When you are a Wikimedian, it helps when your vision coincides with the vision implicit in this big strategy. I was asked to present at the Wikmedia Nederland conference; I presented a historic view on information gathering and sharing. The presentation was given in English because it was the one common language in the room.

I love presentations but talking with people I love even more. I was asked for stategies behind the things that I do, the things I value. The Luc Hoffman award is an example. It does not have a Wikipedia article but the subject, the science is of real relevance in this time of climate change. The idea of associating links (blue, red and black)  is a non confrontational way to bring Wikidata value to Wikipedia. Adding all the USAmerican alumni from en.wp categories will allow us to keep up with what they hold and know about even more USAmerican alumni. There is method behind the madness.

Now that the Wikimedia strategy goes to the next phase; I hope for many user stories; stories explaining what we are going to do and for whom. I also hope that technical considerations will not prevent innovation and improvements. In the end that is not what a strategy is. It is the hope for the bright future we deserve in our Wikimedia movement.