Sunday, November 30, 2014

#Wikimedia & diversity - Mary Hinkson

Mrs Hinkson died November 26, 2014. She was for a long time a member of the Martha Graham Dance Company. She is considered to have been very influential; she was awarded the Martha Hill Lifetime Achievement Award. She studied on a university, she taught at a university and, according to the Wikipedia article, she influenced both ballet and dance.

The reason why Wikidata knows about it is because I took the time to add them. In this way I can point to the failings of my approach to adding data.

When an article does not have relevant categories, typically I will not add associated information. Mrs Hinkson is highly notable and the only category that adds information for her is: "American female dancers". 

There are many things I will not state. Nationality gets you in conflict in too many ways, so does race, religion. The consequence is that Wikidata is fairly uninformative about this. From a diversity point of view, it is not that great.

I think when race, religion and nationality play a big role in an article, the Wikipedia categories may not be all that inclusive. To find if this is true, takes some research.. The results I am looking forward to.

#Wikimedia - #Wikidata; a recurring subject

The last Foundation Metrics meeting is kinda interesting, particularly when you are interested in #Wikidata. In his part of the meeting Erik considers the implications of Wikidata and wonders what it takes to help it lift off.

Erik wants us to change the world. Now that is a big statement. It can be done. It takes big thinking, maybe even bigger thinking or maybe no thinking at all. In the vision Erik presented, it is all about data and leveraging the data for instance in info-boxes.

Have a closer look at Reasonator, to its statistics and to these specific statistics (it takes a long time to load). What you find is probably the most easy and effective thing to do. It is allowing people to add labels in their own language. Labels leverage all Wikidata statements for a language. With those labels you can disambiguate effectively in a search. Try it in Reasonator. Now change the language and notice that effectively the automated descriptions are still there. Now do the same with search in Wikidata.. See ?

At Wikimania there was a heated discussion about the need for descriptions . The only half baked argument to keep the current descriptions was that people outside expect them. Lets not be strangers.

Friday, November 28, 2014

#Wikimedia #Labs - my #stake is rare, bruised

I have a stake in Wikimedia Labs. I rely on it. I am not the only one. Wikimedia chapters rely on it; they need it for many of their activities. Glam is one area vital to them that relies on Labs.

For the Wikimedia Foundation, Labs is second tier. They have a few people dedicated to Labs. Good people, well intentioned people but what they offer is not production quality. They cannot for several reasons. There are not enough resources for them to do what is needed.

Chapters are second class citizens as well, The fact that Labs is vital to achieve their aims so far did not make a noticeable difference. In my opinion it is not only the Wikimedia Foundation who can and should make a difference. It is the chapters themselves as well.

I urge the chapters to invest in Wikimedia Labs.. It is BOTH the responsibility of the WMF and the chapters to provide adequate support. During business hours operational support should be available. Stakeholders in both WMF projects and chapter projects rely on adequate service.

Today is black friday in the USA. Yesterday was Thanksgiving. When all staff celebrate their turkey we are left to fend with even less.

Thursday, November 27, 2014

#Wikidata - #today, #tomorrow

In #Reasonator you can check out dates. The first people of today are known to have died. Who will die tomorrow is only known to God. All we have to do is wait and see.

When we are all done with Wikipedia, all the living people will have died, Hmmm, that is a long time coming. First we have to kill of the ones not known to be dead yet.

#Wikimedia - #empower the #chapters

It has been #budget time for the Wikimedia chapters. As it is centrally decided what chapters "get" and as the finances of the main organisation are not considered under equal terms, they are secondary by definition.

To prove this, a few points:
  • The WMF director defined criteria for quality for the chapters
  • The chapters are barred from involvement in the annual WMF fundraising
  • The chapters rely on funding from the WMF AND the metrics of success do not exclude the cost of WMF related admin
  • The chapters can not compete for the resources the WMF assumes its own for new endeavours
  • The chapters are not represented at the office of the WMF
Many of these points have a long history and are sacred cows to some. My point is very much that there are many small things that can make the distinction less stark. It starts with an awareness that chapters support open culture and a community in a country. They would benefit from shared resources that can be made available after minor modifications of what is already there.  Our movement is not only English Wikipedia and does not only have an USA or alternatively a world view.

Wednesday, November 26, 2014

#Wikipedia - Hey, #College Boy II

Remember? At this time, of the sum of all the 195085 notable people with an alma mater, 158649 are men and 29059 are  women for 7377 no gender is known.

They include all the boys and girls of *your* university. Take the University of Virginia for instance. When I first looked at it, there were only 142 alumni. The category knew about at least 815 more of them. They are being added as well, software permitting.

This query has all the UoV alumni. These are all the men and these are all the women.. Maybe this is a good time to write Wikipedia articles, identify articles to Wikidata about the female UoV alumni.

Sunday, November 23, 2014

#Commons - the Como Cathedral

The Como Cathedral is a cathedral in Como, Italy. In it you will find works of art that are represented in Commons. This link to the works of art is established through an "institution template". It was easy to link the Como Cathedral by adding this: "| wikidata    = Q1101730" in the template.

At the moment there are 1149 templates waiting to be linked to Wikidata. With this link established, it is possible to either populate these templates with information from Wikidata or populate the templates with information from Wikidata.

It is a precursor for easily finding files in Commons that are linked to institutions. Many of them are GLAM partners of us and it is yet another way of establishing how important they are to us.

#Wikimedia #chapters, the data

Guess what, Wikimedia chapters are linked to many other organisations. These organisations are known in Wikidata and now the chapters are known as well.

For many GLAM partners we have all kinds of statistics. We could link the partners to the chapters that they are connected to.. It is the basis for information on the usefulness of the chapters.

#Wikimedia - the point of #collecting #data?

If #Wikidata is one thing, it is useful. It was useful from the start by including all the Wikipedia articles who are linked to articles in other languages. In the next phase statements were added and more and more articles were added that did not link to other articles. They were needed because they were a part in the expression of a statement. Then for all articles Wikidata items were created and still more items were created because they were needed in the statement of expressions.

There is a point to linking the articles. It enables people to read about the same subject in other languages. There is a point to adding statements to items; it enables articles to be linked to whatever. This combination enables us to report on Wikipedia in ways not yet done.

If you want to know about the gender division; currently these are the men, the women in all our projects. Since June 2014 90,850 more items became known to be women and 445,240 as men. Interesting but this information is not in a format that is "academic" or useful.  Having this information in a bar chart with regular intervals gives more insight in what we have. Using old dumps for this is one solution. Breaking the information up per Wikipedia provides even more granular information.

Providing statistics in this way is good for several reasons:
  • it is public and verifiable information
  • it stimulates people to add statements about gender
  • it stimulates people to write about men and women
  • it makes it obvious that it is Wikidata where we know these things

Friday, November 21, 2014

#Wikimedia - first #standardisation, then #specialisation

The hardware and software used by the Wikimedia Foundation is increasingly standardised. It uses the same software and the configuration is centrally maintained. Good news; it makes for a stable platform. A stable platform allows us to share in "the sum of all available knowledge".

With this process well under way, special attention can be given to special projects. It has probably escaped your attention that the WMF now has a "Services group". They are the engineers that support the standalone software components that often run on their own machines and have very specific jobs, such as "generate a PDF from this article".

Wonderful news. When it did not escape your attention, did you notice that Stas Malyshev is getting up to speed on the Wikidata Query Service[1], figuring out what we need to do to make it suitable for widespread deployment of WikiGrok[2])?

Effectively it means that Magnus's query tool will be used by an updated version of the Games [3]. Now is that not sweet; Wikidata data being USED to leverage our community to improve Wikidata even more.

Thursday, November 20, 2014

#Wikimedia & Project #Gutenberg - the sum of all knowledge

"To share in the sum of all knowledge" is the vision of the Wikimedia Foundation. The Swiss chapter does understand this really well. It has adopted Kiwix, an off line reader for content that is published in the ZIM format.

Project Gutenberg is a well established organisation dedicated to the digitisation of books. Its catalogue of 50.000 public domain books is now available to everybody, everywhere and offline as well.

Thanks to a hackathon, all books are now available in the ZIM format, you can search in all the books at the same time. The best news is that not only has this work been done for a first time, it is build in such a way that it can be easily repeated.

Future deployments may include all the books of Wikisource, books from other sources and even copyrighted works as well. The point of Kiwix is that it is an enabler, it allows for the dissemination of knowledge and to achieve THAT is what our aim is.

Congratulations to the Swiss Wikimedia chapter for providing the sustained support of this valuable project.

#Wikidata - C. Rudhraiya; #filmdirector from #India

Mr Rudhraiya studied at the Adyar Film Institute and, he recently passed away. According to some, he brought fame to his alma mater. Mr Rudhraiya also studied at the St. Joseph's College, Tiruchirappalli.

The point is not so much that Mr Rudhraiya was a studied man, it is more that we know this about him. As more information like this is known about "living persons", they get a better representation in Wikidata.

At this time only two movies of Mr Rudghraiya are known to be directed by him. There must be many more. It is possible to know all the people he worked with by connecting him through his movies, With more data this information becomes more complete.

Wednesday, November 19, 2014

#Wikipedia - Nel Garritsen, a Dutch swimmer

Mrs Garritsen is one of only a few people who are known to have died and has an article in the Dutch Wikipedia. In that article it is currently not known that she died. We know it in Wikidata courtesy of the article in the English Wikipedia.

Every Wikipedia do things their own way. By not having categories for people who died in a given year, there is no way to know about the recent deaths known in the Dutch Wikipedia. It is also not possible to indicate to the Dutch Wikipedians what people are known to be dead in other sources.

Mechanisms like this help to ensure that proper information is available for "living people". Arguably, maintaining categories with the people who died in a given year are a valuable instrument in an implementation of "BLP".

Tuesday, November 18, 2014

#Wikidata - Carl Sanders, is not the 74th "List of Governors of #Georgia".

It is said that the community is always right. It also has a short term memory and its consensus is not necessarily what you hope for.

Take Mr Sanders, he died recently and it was indicated that he was a "List of Governors of Georgia". It is an old argument that is the result of some bad practice at Wikipedia. The Wikipedia article includes mainly a list and consequently it is to be called a list. There is no article about the subject itself and hey "it must be a list in Wikidata as well".

It is simple to fix the situation for the governor of Georgia. All articles are lists, there is no Wikipedia that has both a list article and an article so I had the item identify the subject.

Using the category I added many of the "missing" governors, there were only 15 humans known to be governor of Georgia. I made all of them a politician and an US-American.

The community has every right to rehash old arguments. I just follow the old consensus and wait for the dust to settle yet again.

Sunday, November 16, 2014

#Wikimedia NL - my #Wikidata presentation - #WCN2014

The presentation I gave at the 2014 Dutch conference in Utrecht went well. Sadly, for whatever reason I found that it is not yet on Commons. That can be remedied.

When I present, the slides include the main points so when people doze off, they can always find what it was all about. This presentation is very much my view on Wikidata. I presented in Dutch and the slides are in English so that it can be easily re-used.

The points I made are:
  • Knowing about Wikidata and its development is best understood thanks to the stats
  • Appreciating the information included is best done through the Reasonator
  • Wonderful tools exist that are sadly NOT part of plain vanilla Wikidata
  • Why and how I make so many edits ... the method in my madness
  • The Dutch Wikipedia COULD activate Wikidata search.. to share in the sum of all available knowledge
  • Much knowledge is not known to the Dutch Wikipedia
  • Wikidata already knows about much meta data on Commons thanks to the Creator templates

Saturday, November 15, 2014

#Wikidata - Jens Brugge; a judge from Norway

Mr Brugge, a high court judge from Norway died. According to the article about him, his lineage is illustrious. Many generations in the Brugge family were quite notable.  It can be seen in GeneaWiki2 and, it can be shown inline or in a separate window from the Reasonator.

There is an increasing amount of genealogical information available in Wikidata. The value of all this data is not in having it, it is in using it. At this time 29,337 people are known to have a father and 13,336 people are known to have a mother. Obviously, these numbers will only increase and become more complete. Would it not be wonderful to share this information in Wikipedia articles as well?

Friday, November 14, 2014

#Wikidata - Thanks for the Book Award

It is very rewarding to read a good book and, it is great when good books find their way to you. There are many literary awards known in Wikidata and the "Thanks for the Book Award" is one of many.

This award has Wikipedia articles on eight Wikipedias. It is a Finnish award and every year there is a new winner. This year it was Pauliina Rauhala for her book "Taivaslaulu".

Most of the Wikipedia articles have not been maintained for quite some time. They are not aware of Mrs Rauhala for instance.

To improve on the Wikipedia articles, all it takes is a mechanism to highlight when a new award is given in a year, The data can be found in Wikidata and, as you can see, in Reasonator we have the timelines showing the winners in order.

As we have the data, we can query for this years awards. With hidden queries, we can exclude those articles that are known to have a winner. It is not hard, it is motivating to share in the sum of all available knowledge.

Tuesday, November 11, 2014

#Wikidata - Annette Polly Williams

As the longest-serving woman in Wisconsin's Legislature, Mrs Williams deserves to be recognised. In her honour, 1752 members of that legislature will be recognised as such and, they will be known as politicians.

Mrs Williams was an advocate of education for all kids. It is one of the things a Wikipedia articles is good for. It is not obvious how to indicate this in Wikidata.

It is a bit strange to find that many American politicians do not have a picture to illustrate their articles. Given the absurd amounts of money involved, providing a few pictures of politicians would be a cheap gesture that would be appreciated.

Thursday, November 06, 2014

#Wikipedia - Now in #Maithili

It is a happy occasion when a new Wikipedia is created. Today we may welcome the Maithili Wikipedia. The website has been created and all the content that is currently still in the Incubator needs to be migrated.

I wish the Maithili community well; I hope they will share with us in the sum of all available knowledge.

Wednesday, November 05, 2014

#Commons - Adolphe Berty

Mr Berty was a French author, antiquarian and archeologist. There is no Wikimedia article about him, there is no Wikidata item for him. There is material on Commons he created. There are links to several external sources making Mr Berty notable enough for them as well.

Amir ran his bot so all the people with Wikipedia articles are now linked to Wikidata as well. People like Mr Berty just need to be created. Really funny is the realisation that many Wikimedians have or should have their own Creator template. As a consequence they are notable for Wikidata. If not now, certainly when Commons is wikidatified.

It is fun thinking about the implications of the wikidatification.

#Wikidata is the #OpenData winner

Wikidata won first place in the category "Publisher" at the OpenData awards..

It is so well deserved to find both Magnus and Lydia share the limelight. I could not be more pleased.

Monday, November 03, 2014

#Wikipedia - Hey, #College Boy !!

You know the answer for this question: "What can Wikipedia do for you", A more interesting question is: "What can you do for Wikipedia".

So you are on this wonderful college where people who changed the world have been educated right?  Well, never mind what your college says, it is Wikipedia and Wikidata where you find public data about graduates that are considered notable enough.

You may find them in articles, in categories or in none of the above in combination with you college. Wikidata is another place where you find them as well.

Now the questions you might want answer to are:
  • how many graduates are notable enough for one or more Wikipedia articles
  • how many graduates do we know
  • what are those graduates also known for
  • what are the most linked to statements for your graduates
College boy, and girl, you are getting an education. This challenge seems trivial, how will you show what the Wikiverse knows about the people associated with your college..

Sunday, November 02, 2014

#Wikidata - #Wikipedia categories

Wikidata knows about Wikipedia categories. Currently there are items for 2,406,128 categories. Many of those items refer to categories in multiple Wikipedias. One random example is item Q8884100, it refers to categories in 10 Wikipedias. All of them categorize people who studied at the university of Notre Dame. Many of them know about people only known in that Wikipedia. In addition to this there may be articles that are not categorised or references to articles that are not in one of the 10 categories known at this time.
When multiple categorised are "harvested" in Wikidata, Wikidata knows about more items than any of the individual categories. This enables the use of the data in new ways.
  • suggest categories based on the presence of statements in the Wikidata category
  • suggest statement when an article is included in a category
  • include red links in a category when a Wikipedia does not have articles.
Before such functionality will become available, certain tipping points will need to be reached. For instance enough categories need to be harvested in this way and, these categories have to be identifiable for the information they include.

The category "University of Notre Dame alumni" indicates that it is a list of humans AND, for them the statement "alma mater" "University of Notre Dame" has to be made. For over 1450 categories that are about humans similar information exists. Every day more categories are added.

I really wonder what it takes to reach the tipping points that will bring more application of the Wikidata data to for instance Wikipedia.