Sunday, November 19, 2017

#Wikidata vs #Wikipedia - Rukmini Maria Callimachi

Mrs Callimachi did not only win the Polk Award, she is both a journalist and a poet and did not only win journalism awards. One of the awards, the Michael Kelly Award is hidden on the Wikipedia article of Michael Kelly

This article is about how Wikidata and English Wikipedia can help each other. The Wikipedia article lists seven awards and this makes it easy to add other award winners for them as well.

Thanks to Magnus' awarder, this is fairly easy but some awards hide out as part of an article and the award has to be added in Wikidata.  It may be one reason why later awards are missing. The religious award she is said to have won, it is a different award with a similar name. The award and the organisation that confers it had to be created.

The point, we can compare data at a Wikipedia with what we have on Wikidata. They should match. When they do not, there is an issue. Copying the data from Wikipedia is easy and it is the obvious thing to do. When Wikipedians decry the quality of Wikidata, they should reflect on why this is the case. When we collaborate, we will slowly but surely improve our quality. In the final analysis our aim is the same; share in the sum of all knowledge.
Thanks,
      GerardM

Saturday, November 18, 2017

#Wikipedia vs #Wikidata - the George Polk Awards

Some Wikipedians consider Wikidata inferior, so much so that they agitate towards a policy that bans Wikidata in "their" Wikipedia. They are welcome to their opinion.

I do bulk imports from Wikipedia and all the time I suffer the consequences. Some three to four percent of their data is wrong for all kinds of reasons, reasons that are manageable with proper tooling.

The George Polk Award is an award for journalism and it got my attention again because the International Consortium of Investigative Journalists received it for their work on the Panama Papers. I noticed that many people listed who had been awarded the Polk Award did not have articles in Wikipedia, that many of the link in the list of award winners pointed to the wrong person and that many award winners did not even have a "red link".

I am in the process of checking all the links and adding the date for the award. I found many issues among them a civil war general and many others false friends. I am adding items for the people who do not have an English article and, I have to check each of them because several do have articles in other languages. It is a lot of work and it is not as useful as it could be because Wikipedia hates Wikidata and we do not collaborate, we do not work together.

There is a Listeria list of winners and slowly but surely it will contains the information that is similar to the English Wikipedia list article. Similar but not the same;
  • the false friends will not be there, 
  • there will be no red or black links
  • people who won the award twice will be missing
Why do this, why spend so much time on one big list? Well, in this day and age of "fake news" we should celebrate journalism but having all this information in Wikidata allows for all kinds of tools as well. We can check for false friends, we can check if the articles on the award winners include the award but also if there are "winners" who are not known in this list and in the source available for the George Polk winners..

I am not a Wikipedian and truthfully I hate the endless and senseless bickering that is going on. So let me work on the data, make it available to tools. Now you Wikipedians, you may choose not to show Wikidata data in your infoboxes but you will not make your errors go away without collaboration. Yes, you can quote a source but when your data is not in line with what the source states, having a source does not do you good, effectively you provide fake information.

My request to the reasonable people at Wikipedia and Wikidata, let us work together and see how we can improve quality. Lets link wiki links (blue, red and black) to Wikidata and improve the quality of what is on offer first.
Thanks,
       GerardM

Thursday, November 16, 2017

#Wikidata - women in red - May Wright Sewall

On Twitter, it was mentioned that archival material of Mrs May Wright Sewall was being worked on. When you read the Wikipedia article, it becomes all too obvious how notable she was. She founded multiple organisations and was known for her suffragist ideas.

The article introduces these organisatons and consequently to indicate the relations, new items have to be created in Wikidata. I only did two and I added her husbands, men that supported her in her undertakings.

By adding these new organisations, it becomes possible to link more people to them. They thereby gain notability and it becomes more likely that at some stage they will get their article as well. The least new people and organisations added in Wikidata do is complete the tapestry of information of an age gone by.
Thanks,
     GerardM

Wednesday, November 15, 2017

#Wikipedia - #Retraction exposing big issues in #science

When a scientific paper is published, it is read and cited by other scientists to further on science. It is read and cited by Wikimedians to write articles and share the sum of all knowledge. The Wikicite project provides better tooling for using these papers as a source in Wikipedia articles, it is one of the more relevant developments in combatting fake news in Wikipedia.

However.. there is an issue with a substantial number of papers; they were retracted. There are all kinds of reasons possible but the bottom line is; they are not to be used as a source in Wikipedia because its findings are false.

The challenge: what papers are retracted, how are retractions and the reasons for retractions modelled and how will we find these papers in the Wikipedia sources. Knowing retractions and acting on them will be a fine art; one publisher in South Africa for instance was pressed to retract a book exposing the president. There will be so many issues exposed once retractions become part of the Wikipedia work flow. Failing to do so will be the worst we can do. We will not be sharing the sum of all knowledge, we will be sharing the sum of what we are told.
Thanks,
       GerardM

Thursday, November 09, 2017

Judith Butler in #Brazil - a reaction in the #Wiki way

When the news has it that an effigy is burned of Mrs Judith Butler in Brazil, it is time to give some attention to Mrs Butler. There is information about her, papers she published and one way of adding to the relevance of Mr Butler is by increasing the people she is connected to.

In 2012 she was awarded the Lyssenko award. Adding that date and the other award winners works in two ways; Mrs Butler is better connected but the other award winners are better connected as well.

There is an article for Mrs Butler in English Wikipedia but given that it is a French think tank who conferred this award, chances are that not everyone on this list has an English article. There are projects that suggest articles to write.. Adding awards in this way may feed those projects. I hope so. For me that would be the best outcome that could be achieved.
Thanks,
     GerardM

#Wikipedia - Ischia International Journalism Award & the Polk Award

When people win awards, they often win multiple awards. Harrison Salisbury won several awards not only the Polk Award. The Ischia award did not have a date associated with it. I used Awarder and the data from the Italian Wikipedia because that was most convenient.

There was no article for Mr Salisbury in Italian and consequently there was no date associated with him. Mr Salisbury is represented with a red link. It indicated 1990 and it was an easy manual edit.

As you can imagine, that red link could link to the information about Mr Salisbury on Wikidata. Showing this information to those who are interested in writing a Wikipedia article in Italian does provide pertinent information, information that should coincide with the new article. By comparing the information in Wikidata and in existing Wikipedia articles you know that the article is likely to be correct.
Thanks,
      GerardM

Wednesday, November 08, 2017

#Wikidata as a Wiki versus the data consumers’ perspective

Wikidata is a Wiki. It follows that many people with many agenda's add data to Wikidata. It is a continuous process and as is usual in a Wiki, all contributions that fit the notability requirements of the project are welcome.

The consumers' perspective seen from a Wiki point of view is a bit awkward. There is nothing but active contributors that work towards any of the quality considerations. Even when there is a reasonable quality for some, it may not be enough for others.

Both Wikipedia and Wikidata are Wikis. Both have issues from a consumers' perspective. They are already explicitly integrated through the interwiki links and implicitly through the Wiki links. One of Magnus's tools makes this visible.

When you then consider George Polk and the George Polk Award it becomes obvious that Wikis have an issue from a data consumer's perspective. In some Wikipedia articles the two are conflated. In others there is a separate list of award winners. Many of the award winners do not have an article and some of the award winners refer to the wrong person. Wikidata could do with more data; the data was imported from Wikipedia and several of the wrong persons are still wrong in Wikidata.

Both Wikipedia and Wikidata consume each others data. Both are Wikis. There is no superiority in either project but they could compare their data and curate the differences.
Thanks,
      GerardM

Tuesday, November 07, 2017

#Wikipedia; Héctor Rondón did not win the #Polk Award

This is Héctor Rondón, he pitches for the Cubs. He did not win the George Polk awardHéctor Rondón Lovera did.

This is a common mistake, it happens all the time and it is where Wikidata may make a positive difference to Wikipedia.. It just requires a different mindset to see why this is the right solution at this time. There are some loud Wikipedians that abhor Wikidata. This is an easy and obvious method that will improve Wikipedia and there is no sane argument why this would not work.

These Wikipedians do not even have to notice that this is done; we can hide it from them and still do a world of good. Not just for English Wikipedia but for all Wikipedias.. Ehm, for the readers of all Wikipedias.
Thanks,
      GerardM

Sunday, November 05, 2017

#Wikidata - There is no such thing as a free lunch

Mrs Adriane Fugh-Berman wrote a paper called "Why lunch matters: Assessing physicians' perceptions about industry relationships". There is no such thing as a free lunch and arguably this is exactly what Wikidata is offering to the bio-medical industry.

All the bio-medical papers find their home in Wikidata and there is no mechanism, there is nothing to indicate the many erroneous papers, there is nothing to indicate that specific substances have been banned from use as a medical substance. When Wikipedia is to use Wikidata for information it will be so bad.

Mr Martin Keller is a psychatrist whose reputation was for sale. "His" paper Efficacy of paroxetine in the treatment of adolescent major depression: a randomized, controlled trial has been thoroughly debunked.

At Wikidata there seems to be the notion that facts like this are an affront to its neutrality. It is why there is no mention on the item for Mr Keller; "significant event" "ghostwriting author" was removed.

The problem is that without sufficient debunking potential for ghostwriting authors, their products and their ill effect, there is no possibility to establish the veracity of the bio-medical facts that have been imported in Wikidata. It is vital to the integrity of the Wikidata project that the Mr Kellers of this world are seen for what they are: frauds.
Thanks,
      GerardM

#Wikimedia - I endorse having a #strategy as it is good to have one

Having a strategy is great. There are objectives and there is an idea how to get there. As the Wikimedia Foundation formulates its strategy, it is complicated. Complicated by necessity because it involves so many interests, people who invested so much of themselves in their project(s), people who speak so many different languages, languages that define them, people with different backgrounds because they define them as well. The strategy must be complicated because it aims to reconcile all these people and the organisations that represent them.

When you are a Wikimedian, it helps when your vision coincides with the vision implicit in this big strategy. I was asked to present at the Wikmedia Nederland conference; I presented a historic view on information gathering and sharing. The presentation was given in English because it was the one common language in the room.

I love presentations but talking with people I love even more. I was asked for stategies behind the things that I do, the things I value. The Luc Hoffman award is an example. It does not have a Wikipedia article but the subject, the science is of real relevance in this time of climate change. The idea of associating links (blue, red and black)  is a non confrontational way to bring Wikidata value to Wikipedia. Adding all the USAmerican alumni from en.wp categories will allow us to keep up with what they hold and know about even more USAmerican alumni. There is method behind the madness.

Now that the Wikimedia strategy goes to the next phase; I hope for many user stories; stories explaining what we are going to do and for whom. I also hope that technical considerations will not prevent innovation and improvements. In the end that is not what a strategy is. It is the hope for the bright future we deserve in our Wikimedia movement.

Tuesday, October 24, 2017

#Wikipedia - Student or Athlete or both ?

College football, soccer, basketball, whatever is a USA phenomenon where young people attent a college or a university on a sports scholarship.

Wikipedia has categories for the different sports and members of such teams are these categories are typically a subcategory for the alumni of a college or university.

For Wikidata the alumni are typically harvested from the specific alumni catalogs and as a consequence it is as if all the athletes did not have an education.

My question, how can we best associate these college/university categories with the alumni categories?
Thanks,
      GerardM

Sunday, October 22, 2017

#Katherine - When the Cebuano issue is no longer about #Wikipedia

Dear Katherine, I loved your presentation at the Berkman Klein Center for Internet and Society. It has much to think about and <grin> it is great that you answer the question you want to answer </grin>.

You address questions like "will we let external organisations use our data for their own purposes". My suggestion to you, us all, is why not use our own data for our own purposes.

The Cebuano Wikipedia is seen as problematic on many levels. It is one of the biggest Wikipedias in number of articles and one of the smallest in the size of its community. Like any Wikipedia, its articles are harvested for use in Wikidata and that brings us to several problems but more importantly in the light of your presentation, opportunities.

Problem: the data used the Cebuano articles are based is problematic
Opportunity: import the data in Wikidata first and first do some curation there.

Problem: the data is licensed under a CC-by-sa license and Wikidata is CC-0
Opportunity: collaborate with the copyright holder and ask their permission to include the data in Wikidata

Problem: when text is generated by a bot, the text when saved in an article is fixed
Opportunity: do not save it as an article but generate the text and maybe cache the text

Problem: other organisations use our data to generate information
Opportunity: we generate information in all the 300 languages where Wikipedia does not have an article

Problem: we have information that has no article in any language
Opportunity: we generate the text and maybe cache the text

Problem: Wikimedia officials indicated that issues like the Cebuano Wikipedia are not relevant
No opportunity; opportunities for all our projects are missed

Katherine, we already generate texts using bots, we already cache our data, we do it for English, we do it for Swedish, Cebuano. Why leave it for the companies of our world to generate text where there is already so much? We can do better, do the same and do it for all our languages as well.
Thanks,
      GerardM

Saturday, October 21, 2017

#Wikidata - just an award winner: Mr Shuming Nie


Mr Shuming Nie is the 2007 winner of the Heinrich Emanuel Merck Prize. As such he was notable for inclusion in Wikidata.

A Wikipedia stub article was created. The article makes it plain that Mr Nie was a serial awardee and when you google Mr Nie, you find for instance the picture you see above. Mr Nie is one of many award winners that are "waiting for the recognition" of a Wikipedia article. By having these award winners in Wikidata, it becomes more easy to find people like someone you care for waiting for an article.
Thanks,
     GerardM

Sunday, October 15, 2017

#Wikidata - motivation; thank you #Magnus

I added a Baratunde A. Cola to Wikidata because he won the Alan T. Waterman Award. This month a Wikipedia article was written and I wanted to add some data to the item.

I did not because functionality that is key to me was broken. A new property was added and all the work that I had done on categories no longer showed in Reasonator. There was no willingness to consider the consequential loss of functionality and the result was a dip in my motivation.

Wikidata is important to me and I asked Magnus if he would help out and change Reasonator. He did.

Now I have added information to Mr Cola based on his categories. It matters that a category like this one reflects all the people known to have played in the Vanderbilt Commodores football team.

The issue is that at Wikidata, we have lost sight of these collaborative aspects. Everybody does his own thing and we hardly consider why. It is why user stories are so important; they tell you why something is done and what the benefit is.  In the end without a benefit there is no reason to do it.
Thanks,
      GerardM

Thursday, October 12, 2017

#Wikisource - the proof of the pudding

A user story for Wikisource could be: As Wikisourcerers we transcribe and format books so that our public may read these books electronically.

The proof of the pudding is therefore in the people who actually read the finished books.  To future proof the effort of the Wikisourcerers, it is vital to know all the books that are ready for reading. It is vital to know this for books in any and all languages supported.

There are two issues:
  • The status of the books is not sufficiently maintained in all the Wikisources
  • There is no tool that advertises finished books
To come to a solution, existing information could be maintained in Wikidata for all Wikisources in a similar way as done for badges. With the information in Wikidata a queries can be formulated that shows the books in whatever language, by whatever author.

Currently there are Wikisources that do not register this information at all. This does not prevent us from making the necessary steps towards a queriable solution. After all adding missing badges at a later date only adds to the size of the pudding, not to the proof of the pudding.
Thanks,
     GerardM

Tuesday, October 10, 2017

#Wikipedia discovers #OpenLibrary

On Facebook, Dumisani Ndubane posted his discovery of Open Library:
I just discovered that The Internet Archive has a book loan system, which gives me access up to 5 books for 14 days. So I have a library on my laptop!!! This is awesomest!!!
And it is. Anybody can borrow books from the Open Library (is is part of the Internet Archive). What Dumisani did not know at the time is that there are books in other languages to be found as well.

Dumisani found out by accident; he googled for an ebook called "Heart of darkness" by Joseph Conrad. What Dumisani did not know at the time is that the Open Library includes books in many languages. His next challenge: find the books in Xitsonga, and tell his fellow Wikipedians about it.
Thanks,
      GerardM

Wednesday, October 04, 2017

#Wikimedia - A user story for libraries

The primary user story for libraries is something like: As a library we maintain a collection of publications so that the public may read them in the library or at home .

Whatever else is done, it is to serve this primary purpose. In the English Wikipedia you will find at the bottom for many authors a reference to WorldCat. WorldCat is to entice people to come to their library.

It does not work for me.

My library is in Almere and, I have stated in my profile in WorldCat that I live in Almere, I have indicated that my local library is my favourite. WorldCat indicates that the Peace Palace Library is nearby.. It isn't.

When it does not work for me, it does not work for other people reading Wikipedia articles and consequently it needs to be fixed. So what does it take to fix WorldCat for the Netherlands; for me. WorldCat is used for a wordwide public and all the libraries of the world may benefit when WorldCat gets some TLC.
Thanks,
     GerardM

Monday, October 02, 2017

#Wikipedia - A user story for WikipediaXL: an end to the Cebuano issue

The user story for #Wikimedia is something like: As a Wikimedia community we share the sum of all knowledge so that all people have this available to them. 

As an achievable objective it sucks. The sum of all knowledge is not available to us either. To reflect this, the following is more realistic: As a Wikimedia community we share the sum of all knowledge available to us so that all people have this available to them.

When all people are to be served with the sum of all knowledge that is available to us, it is obvious that what we do serve depends very much on the language people are seeking knowledge in. What we offer is whatever a Wikipedia holds and this is often not nearly enough.

To counter the lack of information, bots add articles on subjects like "all the lakes in Finland". This information is not really helpful for people living in the Philipines but it does add to the sum of available information in Cebuano.

The process is as follows: an external database is selected. A script is created to build text and an infobox for each item in the database. This text is saved as an article in the Wikipedia. From the article information is harvested and it is included in Wikidata. One issue is that when the data is not "good enough", subsequent changes in Wikidata are not reflected in the Wikipedia article.

Turning the process around makes a key difference. An external database is selected. Selected data is merged into Wikidata. This data is used to generate only new article texts that are cached in all languages that have an applicable script. As the quality of the data in Wikidata improves, the cached articles improve.

With Wikipedia extended in this way, WikipediaXL, we become more adept at sharing the sum of our available knowledge. With caching enabled in this way, any language may benefit from all the data in Wikidata. It is considered important to consider the quality of new data. Data may come from a reputable source or from a source we collaborate with on the maintenance of the data. What is to be preferred is for another blogpost.

Saturday, September 30, 2017

#Wikipedia - #Wikidata user stories

User stories are important. They indicate why a certain functionality exists or the purpose of a project. A "user story" has a fixed format:
As a <insert a role> I would like to <insert an activitiy> so that I <insert a purpose>.
One user story is: As a Wikipedia editor, I can link an article to articles in other language(s) so that a Wikipedia reader can find an article in a language he or she can read.

Another user story:  As a Wikidata editor, I can maintain statements on Wikidata items so that Wikipedia readers always have the latest information available to them.

The first user story has been a resounding success. It is why Wikidata was relevant from the start. The second is very much a work in process and it depends very much how the current state of affairs is evaluated. There are dependencies for the efforts of so many to have an effect;
  • Readers of a Wikipedia can only see the result when the information has been included in Wikidata
  • Wikipedia readers will only see the result when the editors of their Wikipedia allow them to see it
The first dependency is with Wikidata editors but the second dependency is outside of the influence of Wikidata editors. For this reason it makes sense to formulate a different user story: As a Wikidata editor I can maintain statements on Wikidata items so that Wikipedia editors can take the responsibility to inform their public.

To help these Wikipedia gatekeepers there is a need for tools that makes them aware of the information they do not provide.
Thanks,
      GerardM

Sunday, September 17, 2017

#Wikimedia and its #BLP approach


There is a huge controversy about the policies about the "Biographies of Living People". Central in all this is that there is no such policy at Wikidata. Many seasoned Wikipedians are of the opinion that using data in Wikipedia is a violation of its BLP policy as a consequence. At the same time there are seasoned Wikidatans who oppose a BLP policy similar to the one at Wikipedia. The problem is that Wikidata does need a BLP policy but it needs to be different for various reasons.

  • An item in Wikidata can be really rudimentary; Marian Latour, a Dutch author, was created because she won an award. This is allowed in Wikidata but the limited information is probably a violation of the English BLP policy. This information came from the Dutch Wikipedia
  • The initial data of Wikidata were the interwiki links. This was a huge improvement for the Wikipedias and there are still many items that have no statements. This is used as an argument not to accept information from Wikidata.
  • Wikidata data is retrieved from a Wikipedia, information like "who won an award". Given the BLP policy of that Wikipedia is should be faultless but it often is not due to disambiguation issues. 
The first issue refers to a red link on the Dutch Wikipedia. When the red link is associated with the Wikidata item, there will not be a new disambiguation issue when a different Marian Latour is introduced. Currently there is only one Marian Latour known to Wikidata.
The second issue is one where Wikidata statistics indicate that slowly but surely is adding statements. They also prove that there is still so much to do...
The third issue is the main one. When an article is linked to Wikidata, articles in other languages should link to the same item or to a red link. Solving these issues requires coexistence and preferably collaboration. 

What we need in a Wikipedia is the ability to link a blue or red link to a Wikidata item. Obviously changing links is either blatantly obvious like for Manuel Echeverria or it requires a source. Technically the necessary change in the MediaWiki software may be "opt in" so that only people who care about this approach to quality make use of it. 

As far as I am concerned, when some Wikipedians find fault elsewhere and do not reflect on this proposal and the improvements it brings them, that is fine. What is relevant is that this approach allows for the best Wikidata practices and at the same time improves the BLP quality in all Wikimedia projects.
Thanks,
       GerardM

Saturday, September 09, 2017

The Manuel Echeverría "revenge"

When there are mistakes in a Wikipedia, it follows that once information is copied from that Wikipedia these mistakes find their way into Wikidata. So Manuel Echeverria did not receive the Xavier Villaurrutia AwardManuel Echeverría did.

So the edit that made Mr Echeverria a recipient of the award was reverted. I fixed things by using the Spanish Wikipedia as a resource instead. The dates were added when people received the award and a few missing people in Wikidata are now known as well.

I cannot be bothered to fix the English Wikipedia. There is no structural solution at this time and as far as I am concerned, there is no interest in one that has been proposed.

There is one additional reason why a solution would be advantageous; reverting edits is a hostile act when edits are made with the best intentions. By actively linking red links and black links to Wikidata, such reversions will become unnecessary.

The problem is that Wikipedians need to understand a problem that as far as they are concerned is elsewhere, and is only caused by the lack of quality of their project. It is with grim satisfaction that I know it serves them well.
Thanks,
     GerardM

Saturday, September 02, 2017

#Wikimedia - Where I make a stand / where I stand for

I was told that my priorities are not the shared priorities of our movement; this by a pivotal person in the WMF. I consider this a personal affront and I will spell out what I stand for and where I make a stand. When you want to personally verify the veracity of my commitment; read my blog and check out my involvement. I have blogged for over 10 years and the basics/citations are all there to find. I consider my position very much in line with what our movement is there for.

==Share in the sum of all knowledge==
This is the overarching aim of our movement. At this time we are congratulating ourselves with what we have achieved so far. There is a lot to celebrate particularly for the English reading world.

===Everything but English===
Given that only 40% of the world population can read English, our successes need to be measured for what we do for all the people in the world. I do not care for good intentions, I care for what can be observed. Financially there is no break down available on the amount spend on English versus the amount spend on all the rest. This is imho a diversity issue as potent as the gender gap. All the arguments why "English first" are structurally no different from any other "my group first" arguments. Just compare the amounts given to US American chapters versus the Indian chapter. In addition you may or may not consider the cost of the software that is developed with English Wikipedia in mind.

===Internationalisation and localisation===
I have searched briefly for "internationalisation" in the 2030 strategy papers. Could not find it. It is however the bedrock of Wikipedia. It is vital for any and all of the individual features of MediaWiki.

When you consider Wikimedia partners like the Internet Archive and their Open Library, we do not even consider how much we will to achieve when together we reach out to the other 60% as well. Our internationalisation platform is open to our open source partners and translatewiki.net is in my opinion a strategic resource.

===Partners===
The successes of our GLAM partnerships prove collaboration serves mutual interests. There are plans to improve Commons, a key part is the Wikidatification that will open up Commons, not only in English but also in any and all other languages. Where we could make more of a difference is help where our partners indicate what is relevant to them. We can show them the effect of the cooperation in any language. At this time what we show is limited to images. This is something we should expand on.

====Internet Archive====
The Internet Archive provides a vital service to our Wikipedias. Its Wayback Machine allows us to proof that references that used to be on the Internet existed. Effectively it is an import tool when the aim is to prevent misinformation. Its Open Library has two parts. The part I am interested in is making free e-books available to readers. We would do better when we collaborate just a bit more and help them with their internationalisation and localisation.

====OCLC====
The libraries of this world collaborate in the OCLC and share their links in one system; the Virtual International Authority File. In its WorldCat sytem, the idea is that people can find books in the library near to them. Thanks to the references to local libraries, it is always possible to know if a book, an author is known in whatever country. Important is for us to improve cooperation and the visibility of this collaboration for our readers and editors.

===Bringing things together===
I have helped bring data from Wikidata, OCLC and Open Library together. I am seeking the disambiguation of Open Library content using existing links to the Library of Congress to the VIAF and consequently to Wikidata. I am adding award winners because they provide arguments what articles to write or improve. Currently I am adding Dutch literature awards to show the Dutch National Library that this information exists and can be used. Recently I added botanical awards to show a group of botanists how small tasks like this add relevance.

===Outspoken stuff===
  • I am not a Wikipedian and consequently arguments specific to any Wikipedia are problematic, mostly irresponsible.
  • I care about diversity; issues around the gender gap do get extra attention from me but it is a secondary consideration.
  • I care about usability and use Reasonator and tools like Petscan and Awarder. The necessity to use Reasonator for so many years is proof perfect that usability does not have much of a priority. Having seen previous attempts at usability, I will consider it once it is available.
  • I expect that there will be more use for our data. Quality is key and collaboration on a meta scale is what will make this possible.
  • Wikidata is particularly useful in English. Theoretically other languages may profit from its multilingual nature. Institutional (WMF) interest is needed to improve this use of Wikidata. 
  • While I respect many efforts of the WMF, I find that its concentration on English Wikipedia has a very negative effect on a micro scale. It is not all bad but it is this division of labour and money that prevents us from having the most bang for our buck.
Thanks,
      GerardM

PS I resent that I felt the need to write this blogpost.

Sunday, August 27, 2017

#Wikidata - surge of new items

Lately there has been a surge of new items coming into Wikidata. They must be quite good when you consider the number of statements. The items with no statements are mainly part of the original load, the Wikipedia articles, and their number is slowly but surely decreasing (1.35% the last month).

With more items in Wikidata, there is more data to support, to edit. As it is, limits are put on the amount of edits. This can be appreciated because of the current performance problems but it is obvious that as this upward trend continues, more people and more data will come to Wikidata to edit as well as to query.

There is plenty of data waiting in the wings to be added. The big challenge is promoting the data that is of use and will enable more collaboration both with people and with organisations.
Thanks,
      GerardM

Saturday, August 26, 2017

#OpenLibrary - Charles Horn and its other volunteers

There are several reasons why Open Library and Internet Archive deserve attention. They provide downloadable books in many language and their Wayback machine comes to the rescue when links in references in Wikipedia go stale. Have a look at the presentation from Wikimania 2017 (from11:46).

The Internet Archive is officially one of the partners of the Wikimedia Foundation. When you ask who in the Wikimedia Foundation is the goto person for contacts with Internet Archive, there is no answer. It is as if there is no structure in contacts with our partners even when it plays dividends to collaborate in a more structured way. When you consider the "Coleman Boat" it is just as if the macro elements are totally missing and it is left for the micro elements to make the difference.

Macro effects of collaboration with the Open Library would be:
  • references are made to downloadable eBooks from Wikipedia - People read books
  • localisation are made at translatewiki.net - People read books in "other" languages 
  • books at Open Library are in Wikidata - links to eBooks are available
  • identifiers are widely shared and widely curated -  work of volunteers has the biggest impact
At a micro level, collaboration is happening. Charles Horn, a volunteer at Open Library is a stellar example. Charles added identifiers to Wikidata and VIAF in the Open Library database. He provided us with a large file of redirects and was instrumental in removing multiple identifiers to Open Library for authors.  He recently produced a Wikidata query to find duplicates and the Wikidata community was made aware of this maintenance work. 

Many of the macro opportunities become possible when conditions at Open Library are met. One big issue is the need for disambiguation and de-duplication. This is not helped with the massive amounts of data involved and the lack of data on the individual author level. While individuals like Charles have an immense effect, it is in the collaboration on a macro level where even bigger differences can be made. Consider; many books include identifiers like an ISBN or a link to the Library of Congress. So it is possible to leverage a tool developed at the Wikimedia Foundation to retrieve associated meta data or to find associated data at the OCLC.

It takes just a bit of friendly prodding from the macro people at the associated organisations, some reassurance that there is support for these efforts and there will be a lot of talent at the micro level making a big difference. Cooperation and coordination is what the organisations are to provide and we will share more of the knowledge that is available to all who come looking.
Thanks,
       GerardM

Sunday, August 20, 2017

#Wikidata - Martin Reints and {{Authority control}}

Martin Reints received the Herman Gorter Award in 1993. There is a Wikipedia article about him and consequently he was known in Wikidata. There was no "authority control" information for Mr Reints in Wikidata yet and this was quickly remedied.

The most interesting part is that the VIAF registration for Mr Reints already included a link to Wikidata. Proof perfect that librarians are actively working on keeping their house in order. There was an Open Library entry for Mr Reints and the Dutch article had a link to the DBNL-website for Dutch language authors.

Open Library I found is very much about books. Their data on the books they have is great; identifiers like ISBN-10 or ISBN-13 and links to the online catalog of the Library of Congress. This makes a lookup at the OCLC for identifiers of all the authors easy and disambiguation becomes more effective.

Wikidata is very much about data. You can query Wikidata for all the winners of the Herman Gorter Award and it the results you can add the links to VIAF or to the Open Library. This ability to query makes all kinds of applications possible like: "what books written by authors who won the Nobel Prize are available in your library?"
Thanks,
      GerardM

Saturday, August 19, 2017

#OpenLibrary and winners of the Herman Gorter Award

If you want to know if the Open Library is of relevance in other languages, you have to do some research. I wanted to find out if there are publications by the authors who won the prestigious Herman Gorter Award?

This award was conferred from 1945 to 2002 often to multiple authors. The first author not known to Open Library is H. C. ten Berge. He received the Herman Gorter award in 1964. There were several authors where Wikidata did not have a link yet for Open Library.

Now consider this: what if we could query Wikidata for all the authors and their publications in Open Library? 

Just a little bit more metadata about books, publications is what we need.. It is not really a big deal, only a few million additional records..

Many if not most of the books at Open Library have links to authorities like the Library of Congress. This makes it possible to link these books through the OCLC to "your library system". It knows about authors and that is what makes it possible to use tools in stead of people to enrich Wikidata and open up all that is in the Open Library for all of us.
Thanks,
       GerardM

Wednesday, August 16, 2017

#Wikipedia - #BlackLunchTable / Brooklyn Hip Hop

The Black Lunch Table project has an editathon on August 20th. It will focus on on important but underrepresented New York Hip Hop/rap artists.

In preparation they have created entries in Wikidata for artists with and without a Wikipedia article. In this way they can prepare information for the editors to use in their articles.

Magnus created a new tool and it shows who edited Wikidata. As a result we can create a query for the edits for the New York Hip hop event for the month of August.

It shows who has been doing all the work.
Thanks,
      GerardM

Monday, August 14, 2017

#Wikimedia - Women in blue

Dear Rosie, I saw your presentation. You want women in blue. In it you mention 300 lists of women. That is a lot of lists. In the mean time the biggest list of women with no article in a Wikipedia can be found on Wikidata.

There has been research in suggesting subjects to people and it works. Leila Zia, one of the WMF researchers wrote about a project they did. So the mechanism is there and you know, Wikidata has oodles of women with no article in "your" Wikipedia that have enough relevance given.

So how about a generator for ideas for articles to write? Leila knows many algorithms and Wikidata knows about many if not most of the women that are on your lists.. Come to think of it, why not add all the lists in Wikidata in the first place?
Thanks,
       GerardM

Sunday, August 13, 2017

#Wikidata - Three award winners of the #ASBA

The ASBA or the "American Society of Botanical Artists" started of in the USA only to become a truly international organisation. They are an important player in the revival of botanical art, they have many local chapters and they have a number of awards.

The three ladies to the right; are the winners of three awards. They now have their Wikidata entries.

I was introduced to people at the New York Botanical Garden and they indicated to me the relevance of illustrations. After that I got into contact with a lady from New Zealand who created a Google list of women scientific illustrators and artists. Her objective is to collect information for Wikipedia articles and many of them already do have an article.

The NYBG is planning future events and for its preparation they do like to include information about awards including awards about botanical illustrators. When the information in the spreadsheet is entered from the start in Wikidata, there is no need for Google lists; Wikidata can play its role in stead.
Thanks,
      GerardM

Saturday, August 05, 2017

#Wikidata - Harriet Martineau and some social opportunities

When you do not already know about Mrs Martineau, do read one of the many Wikipedia articles, she is considered to be the first female sociologist and introduced many subjects into sociology that were up to that time not considered.

The picture is a crop of a painting at the National Portrait Gallery by Richard Evans. The picture is known at Wikidata, at Commons the Creator template is missing.

At the Biodiversity Heritage Library Mrs Martineau was know for her book a complete guide to the English lakes. It was the only book known for her at Open Library.  Given the relevance of Mrs Martineau this was strange and sure enough she was known as "Martineau, Harriet" and changing the link to the book was easily done.

At Wikidata meanwhile, there was a hidden link to Mrs Martineau to Open Library thanks to all the good work of the Freebase volunteers. Approving the change was obvious.

At Wikidata there is now a link to both VIAF, to the BHL, to OL for Mrs Martineau and to over 20 more sources. The BHL has links to both Open Library and VIAF. When the links differ, it becomes obvious where work needs to be done.

The result is a better service for all the people who make use of any or all of these resources. We truly should collaborate and strengthen our partners, the partners we share data with.
Thanks,
      GerardM

#Standards - the International Plant Names Index

#IPNI is a collaborative project between three august bodies in the taxonomy of plants. They are the Royal Botanic Gardens, Kew, the Harvard University Herbaria, and the Australian National Herbarium.

There are three areas where IPNI sets the standards: plants, authors and publications. The objective is to disambiguate any taxonomic reference to a plant in scientific literature to the correct taxon given the taxon name, its author information, publication information and date.

IPNI publishes several graphs indicating the success of their work. I have been involved in this work as a consequence of a database project I did for my father who loved his cacti and succulents.

One example of what information IPNI provides can be found in this page for the "genus" Echninocactus. In my understanding, the correct full taxonomic name is: "Echinocactus Link & Otto Verh. Vereins Beford. Gartenbaues Konigl. Preuss. Staaten 3: 420. 1827". It has all the required information, it has type information, it has links all as you would expect of a standard like this.

To appreciate the work of IPNI; in stead of "Link & Otto", there may have been: "Link and Otto" or "Link et Otto" or ... obviously the information for the publication is easily made into a different abbreviation.

Wikidata included only a subset of the full taxon information. It is easy enough to understand why; Wikipedia only needed the most current one. It is an easy model; works relatively well and it breaks in the corner cases. With the development of WikiCite there is a great and possibly easy opportunity to expand on the current work given the expanding collaboration with botanical partners like the Biodiversity Heritage Library.
Thanks,
      GerardM

Sunday, July 30, 2017

#Wikidata - Mrs Helen M. Duncan is not the only geologist

There are many ways of updating Wikidata. Individual statements for individual items are made. They are worthwhile but on the grand scale of things they have little impact. Another approach is to seek sets of data that can be updated all at the same time.

Mrs Duncan is among others relevant to the Smithsonian Institute. The approach of adding loads of data for many people has the advantage that when the same issue like Mrs Duncan not being identified as a geologist, is fixed for many people at the same time.

To do this, I identified a category that implied the missing statement and I used PetScan to add all of the missing data in one go. Together with Mrs Duncan I made 1005 humans a geologist.

These are small numbers, they hardly register. But as it is, there are Wikidata administrators actively preventing edits because Wikipedia cannot cope with the volume of changes in its recent changes. 

There is no plan, no timetable for the underlying problem to be solved. Wikidata people are told not to make mass edits. It is however the only way to make a real difference and make Wikidata halfway usable.

There are two options:
  • improving Wikidata as fast as we can and in the best way possible - as a consequence changes at Wikidata will not all be visible in some Wikipedias
  • allow Wikidata to edit to the extend that Wikipedias can keep up with the volume of changes - as a consequence people will go away and new projects will not start
There is a prima facie case to be made for the edits to be seen in the Wikipedias. Its efficacy has not been studied and some say that the user interface sucks too much to be useful. Arguably keeping these changes is based on beliefs/assumptions and not on established facts. 

We should imho make all the edits we can make and when the Wikipedia recent changes are to be salvaged, give it the highest priority particularly at the Wikipedia end. It sucks that we can not provide all changes to them but hey that's life. 
Thanks,
      GerardM

Wednesday, July 26, 2017

#Wikidata - in #defence of Erika Herzog

On Facebook, Erika made a few comments that were not well received. A few really positive things did come out as a result but there is a need to defend Erika and her central argument. She asked if there had been a process of consulting the English Wikipedia community because the user interface of Wikidata is so poor. She said:
"... But I am pretty sure a lot of En Wikipedia editors are going to be sort of upset about this shift that requires them to actually edit Wikidata without a form input method (on WikiMarkup). Is there a form input on Visual Editor for this?"
On Facebook she is attacked for all the wrong reasons. A Wikimedia functionary asks: "How is this a Wikidata matter? English Wikipedia is where you want to discuss this." Erika's answer is spot on: "Actually no it's not. I'm tired of this response. It's not helpful or realistic. This is a Wikidata item in terms of buy-in and outreach to incorporate more Wikipedia editors. It's disingenuous to posit otherwise. This needs to be a discussion on both sides, and I think the onus is more on the Wikidata side as the usability and UX is poor at best."

One positive outcome of the Facebook thread is that it is mentioned that there is a method under development to edit Wikidata from Wikipedia templates. However welcome, it is going to introduce its own problems because the primacy of the data remains at Wikidata. The user interface of Wikidata is indeed awful. As one of the more prolific Wikidata editors I only use it for editing. For displaying the data I use Reasonator exclusively. Compare this with this for instance and you will see why.

The reason for this are applicable priorities. The WMF has too many concurrent ambitions for Wikidata and the staff is overextended. When the question is if Wikidata is sufficiently user friendly for an average Wikipedian, the answer is no. At this time Wikidata cannot cope with all the changed committed to it as it is, the wise words of Johan Cruyff apply; every disadvantage has its advantage.
Thanks,
      GerardM

Sunday, July 23, 2017

#Wikidata - Franziska Michor and #notability

Because of Facebook I read something about Franziska Michor. What triggered me was that she received an award. Her occupation, biomathematician, does not even exist (yet) on Wikidata.

To understand what a biomathematician does, it is great to watch the TedMED presentation by Mrs Michor. It gets me to the question of notability; I was amazed that Mrs Michor did not have a presence on Wikidata. I do not know if TEDMed is part of the TED project we had and I have no clue how to add this presentation.

The problem with an ever increasing scope of Wikidata, the challenge becomes less one of introducing data but more of maintaining data. This is particularly true when you look at Wikidata from a mathematical point of view. With Mrs Michor there are several datasets that gained notability and can do with some tender loving care: biomathematicians, TEDMed talks and the Vilcek Prize for Creative Promise.
Thanks,
     GerardM

Saturday, July 22, 2017

#Wikidata - Prix de Coincy and Raymond Benoist


The Prix Coincy is an award conferred by the French Botanical Society. The first time it was awarded was in 1904 according to the French article but the first botanist who is known to have received it, got it in 1906. He was Edmond Gustave Camus a red link in the French article but he has articles in several Wikipedias.

Botany is one of those subjects that have appeal; people care about plants, how they are named and consequently many botanists have articles in multiple Wikipedias. This became obvious when all the red links and black links in the article were entered in Wikidata. Like Mr Camus most already existed and just had to be associated with this award.

There are a few items that are not that obvious; Raymond Benoist is one. The French article has it that he received the award but there is no source and at that the only source for the award is the French article. Another issue is with the 1949 award; they are likely three people, one is Louis Quentin, the others Henri and Madeleine Stehlé. Nothing wrong with being bold I suppose..
Thanks,
     GerardM


Sunday, July 16, 2017

#Wikidata Tool - The #Awarder

The Awarder is a tool I use everyday to add people known to have received the award to Wikidata. Its use is straight forward:
  • find a list of award winners, a list that includes the person and the year it was conferred
  • copy the source text into the awarder
  • identify the wiki the data is from
  • identify the award by its Wikidata identifier.
  • open the results in "quick statements" for processing. 
Easy. When done properly the result is as good as the information from the Wikipedia it came from.

There are a few points. Some lists, like the one on the John Wesley Powell award, have the year on a line and the data is implied for the following text. The results is ten people identified. There are a few red links in there for instance for "George M. Hornberger" and Awarder has identified him so that I can click on a button to find him in Wikidata. As I did not, I added him in Wikidata for later processing. Awarder does not identify organisations as award winners so I had to add the identifier for for instance the "California Department of Transportation". John Galetzka is the award winner for 2016. He is a "black link" so I identified him in the tool with brackets and as a result I could add him as well.

For fifteen award winners it is now known that they won the award. Slowly but surely it adds to the relevance of these people in Wikidata and the missing award winners become easier to identify for the implied notability.
Thanks,
       GerardM

PS thank you Magnus for a great tool

Friday, July 14, 2017

#Wikidata VS #Wikipedia - the issue with input, output

I was told that I should not talk about quality because "on the basis of my work I did not give a good example". Basically I was told to stop what I am doing. As I have written a lot about quality and argued how we can achieve greater quality it is not funny nor is it appreciated but the guy has a point.

With 2,304,191 edits there must be a lot that is wrong in what I have done. No matter how careful I am, the percentage of errors that is to be expected means that with 6% there must be at least some 138,252 errors that I introduced. The problem is that depending on your outlook this is acceptable or it is not. When in stead of me 100 people did the same work, the result would have been the same; together they would have introduced around 138,252 errors as well.

I totally agree that we need to bring our errors down. There are three steps where errors have their origin; input, process and output.
  • My input is based on the Wikipedias; their content all have their own issues. They all operate on their own little islands; there is no or little coordinated effort to make the quality of the information we provide a collective ambition.
  • My process is based on identifying what I want to work on; typically awards, often the enrichment of data around one person. For tools I mainly use what Magnus provides; they provide superior usability. Reasonator makes Wikidata statements intelligible, it provides superior disambiguation and automated descriptions. Awarder adds both the year and the person who received an award. It allows me to effectively cover a lot of ground. They are the tools I use most, others like PetScan are also invaluable.
  • There is too much output I generate and consequently I do not care for individual edits. I justify them all for the process, the routines I follow. I added "Claudia Wills" based on the information in the article of the eponymous award. Like other notable birdwatchers, Mrs Wills does not have her own article and I added her to complement the information on the award.
We share in the sum of knowledge and when the quality of what we provide is to improve, our movement has to become dedicated to the quality of all our information. The typical Wikipedian does mostly care about his or her own project and that is fine; we do not need all of them in an effort to improve our overall quality. The effort I propose can be hidden from view.

A Wikipedia article contains many links; they are blue, red or black. All the blue links are implicitly linked to Wikidata items. Many issues become evident when they can be compared with the links in articles in other Wikipedias or Wikidata. Some Wikis have additional links and they can be mapped to red links and black links. This prevents problems when articles are written with the name suggested in this link.

Once articles on a same subject in many Wikipedias are linked, all kinds of additional functionality become easier; one that is close to my heart is when a new award winner becomes known..
Thanks,
      GerardM

Saturday, July 08, 2017

#Wikimedia project - #PlantsAndPeople

#Wikidata is a great to encourage collaboration and reporting for Wiki projects. The results of projects like the Black Lunch Table have been encouraging so for; reports for articles in multiple languages, gender ratios were possible because of the Wikidata link.

A new initiative is PlantsAndPeople. There have been editathons in the past and more are planned. It is about both people and plants so the kind of questions that may be asked will be quite interesting. For instance how many taxons were described by the people in the project and how many people were honoured in taxon names.

At this moment the people who are the subject of editathons are added. This list will grow slowly but surely and only once it is done, it can replace list in Wikipedia. It will take quite some time to get there because it makes sense to add additional data as well. This is the best way to quickly improve the quality of the data involved. So far quite a number of mycologists and ethnobotanists have been added. A question has been raised in Wikidata about people named in taxons and a picture that should be in Commons is waiting for someone else to transfer it.

When you are interested; join in the fun.
Thanks,
      GerardM

Wednesday, July 05, 2017

#Wikipedia - there once was a lady from #Estonia

Once upon a time there was a Wikipedian from Estonia. He decided to write about a fellow countryman, Kersti Kaljulaid. When your Estonian is as good as mine, it is not a name you remember or a person you are likely to have come across.

At the time this was the same for the English Wikipedians; she could not be notable because there were not enough sources in English.. So for all the good reasons the article was in danger. Our Estonian Wikipedian said: "wait a week". A week later Mrs Kaljulaid was the president of Estonia.

I have taken the liberty to add additional data in Wikidata. Mrs Kaljulaid received two awards and others award winners have been added. No sources for them in English either. To be brutally honest, incidents like this prove why English Wikipedia is only a subset of the sum of all knowledge. Because of this insistence on English sources, English Wikipedia can not cover the sum of all knowledge. People who seek reputable information on foreign subjects will not find it.
Thanks,
       GerardM

Sunday, July 02, 2017

Comparing #Wikipedia using blue, red and black links

There are reasons to compare Wikipedia articles on the same subject in multiple languages. When you just want to read, you may find additional information in another language but as you can imagine, the content should be largely the same. Consequently, the links in an article should go to articles that are about the same topic.

One problem with "blue" links is homonymy. You write a subject in the same but they are not the same; John Doe is one example. Finding these issues, issues that are surprisingly common, can be done by a bot using the Wikidata identifiers for the linked articles.

When there is no article to link to, there is no implicit link to Wikidata. There are two options; we can fake a link by accepting the red or a "black" link as synonymous or we can link a red or a "black" link to Wikidata. The latter is precise and has additional benefits.

When all links are associated with Wikidata items, it is obvious what links in what language are missing or are additional. They are of interest because they may imply potential information to be added to articles or they may point to errors even vandalism. Another benefit is that it helps establish a baseline for a NPOV or neutral point of view without a need to understand the language.
Thanks,
      GerardM

Saturday, July 01, 2017

#Wikipedia - Blue, red and black links

Lists in Wikipedia, like this list of award winners of the Tony Kent Strix award on the right exist as blue, red and "black" links. At the moment only an article in English exists about the award and based on past experiences it is likely that other award winners are known in other Wikipedias.

Based on the information in the article, it was easy enough to add the missing information in Wikidata for all the "black links". When you now compare the information in Wikidata with the Wikipedia article, it is feasible to link fixed text to a Wikidata item. This makes it feasible to trigger a warning once a blue link is possible based on new  Wikidata information. In this way a link to Jack Mills is already likely.

When we can compare the information in an article with data in Wikidata, there is an additional way to compare the information and prevent errors and vandalism. Wikidata is after all superior in its use as a tool for disambiguation.
Thanks,
     GerardM