A One-Year Retrospective of Wikidata Software Collaboration

Prologue

Imagine a world where everyone has access to the same knowledge, regardless of their background or language in a structured, connected way. This is the driving force behind the creation of Wikidata, a free and open knowledge base that can be read and edited by both humans and machines.

One of the projects under Wikidata that is still in its infancy is Wikidata Lexeme. Imagine a hyper-connected “dictionary” that can be understood by computers such as large language models (also knonn as LLM—GPT is one of them) and humans at the same time—it has the potential to revolutionize the way we understand and visualize language.

I remember when I first started using Wikidata and discovering the Lexeme project, I was really excited about it. But I was also quickly frustrated by the complexity of the interface. It’s difficult for me to find the information I was looking for, and I didn’t know how to use the more advanced features such as querying. I think that this problem is probably well-known by the Wikidata development team for quite some time, but they need help from others to start solving this issue, especially on the scale Wikidata operates now.

I will discuss the UX research and prototyping process for a new Wikidata tool that aims to make the Wikidata Lexeme project more accessible and user-friendly. I will also share some of the lessons learned along the way.

Wikidata Software Collaboration

Wikidata Software Collaboration is funded by the Arcadia Philanthropic Trust and initiated by Wikimedia Deutschland to bring forward the movement strategy initiatives in collaboration with Wikimedia Indonesia and Igbo Wikimedians User Group.

The project started in July 2022, and I joined in August of the same year. After 12+ months in the collaboration, we’ve done some amazing things—finding out what our community members are like, the way they interact and see Wikidata, and figuring out a solution that can be developed in a realistic timeframe, done iteratively with agile methodology, and novel yet useful enough to capture the community’s interest in development, testing, and feedback loop.

Research

The initial goal of the collaboration was to find ways to help contribute, reuse, and improve the quality of data in Wikidata Lexeme. To do this, we first conducted user research to understand the needs and challenges of Wikidata editors in Indonesia.

The research was conducted using semi-structured interviews with 15 Wikidata editors in Indonesia who had joined the local Wikidata community or attended Wikidata events. They had a variety of experience levels, from beginners to experienced editors. The interviews were conducted remotely using Zoom.

The interviews were designed to understand the mental model of Wikidata editors, their user journey, the issues they face, and their perception of the project. We also asked them to perform short activities of lexeme search and contribution.

The research findings were distilled into an affinity diagram by grouping together similar ideas and concepts. The findings of the research have implications for the design of the Wikidata tool that will be developed. The tool should be designed to be simple and easy to use, mobile-first, and provides better support for users.

Key findings

“If developed well, the (Wikidata) Lexeme project will produce extraordinary results! The data can be used as interesting research materials.”

—A participant’s opinion about the future of Wikidata Lexeme

Everything is done voluntarily by volunteers. Contribute, no matter how small it is. If there’s a mistake, someone will fix it, no matter how small the mistake is. Don’t be judgmental and don’t be emotional.

—A participant’s opinion about core values of a wiki community

Wikidata & Wikidata Lexeme

  • There’s a steep learning curve to understand how to use Wikidata, especially lexeme search and contribution.
  • Mobile phone usage is common in Indonesia. Sadly, Wikidata Lexeme cannot be edited on mobile, except using desktop mode. Challenges include limited data and spotty connection outside big cities.
  • Translation into local languages is very important. Jargons need to be explained.
  • The most concerning data quality issues are duplicates and incomplete statements. It’s because most editors don’t know what to add in the first place.
  • Concerns on unique properties for local languages, for example non-Latin character input, register, and dialect.
  • The community noticed similarities to WordNet and other online lexicography-related websites. They also compared it to other Wikimedia projects, such as Wiktionary, Wikisource, and Wikipedia.

Community

  • Indonesian editors are surprisingly active in Wikidata relative to other Wikidata communities.
    Indonesian editors like completing missing things in things that they’re interested in and familiar with.
  • Indonesian editors prefer to create items or lexemes from scratch to complete, because they feel proud of their contributions.
  • There are linguistics-oriented and tech-oriented people in the community.
  • They communicate through WhatsApp and Telegram. They get announcements from Wikimedia Indonesia and local community channels on Instagram and Twitter/X. Short, visually interesting contents are preferred.
  • The community wants to be heard, but they don’t know how to give feedback and push for change.
  • There’s little knowledge transfer between experienced community members and newcomers in the community. It is related to lack of new editor retention.
  • There’s no consensus on what to contribute upon in Wikidata Lexeme for local languages in Indonesia.

Additional discussions with Wikimedia Deutschland

The research was discussed with Wikimedia Deutschland’s UX team. The main topics that were discussed were lowering the barrier of entry and the importance of mobile usage.

  • The fact that there are lots of tutorials and training for newcomers suggest that the current Wikidata interface is not user-friendly enough for newcomers.
  • The use of lexical jargon that were complained even by linguistics-oriented people as “too technical”. There is an accuracy vs. precision dilemma compromise when we select a jargon.
  • Much of the current incentivization of individual contributions may be caused by the current wiki culture. In this case, there is only so much we as UX designers can do.
  • We need to combat the idea that no one edits from phones by gathering data about Wikidata users who actually use phones to access and edit Wikidata.

Demographics and persona

Demographics

To sum up, here are some insights that we can pull from the demographic:

  • Indonesian Wikidata editors tend to skew younger.
  • Most of them live in more developed parts of Indonesia such as Java, Bali, Sumatra, and Kalimantan.
  • All of them used laptops and the majority of them also edit on mobile devices.
  • Most of them are involved in other Wikimedia projects, particularly Wikipedia, Wikisource, and Wikimedia Commons.

Persona

The Indonesian Wikidata community can be divided into five categories regardless of editor tenure.

  • Andi, The Data Utilizer: Tech-oriented, academic users of Wikidata for research and product R&D.
  • Sandi, The Teaching Linguist: Teachers and educators, linguistics-oriented people who want to learn deeper about languages.
  • Hendra, The Affable Maintainer: Community members who become the community’s cornerstone because of technical and people skills.
  • Shinta, The People Inviter: Community members who get new members onboard and skilled in social media and communication.
  • Joni, The Competitive Editor: Resourceful members who self-improve and continuously contribute out of passion.

Ideation

The next step was to come up with ideas for a new Wikidata tool. We started by looking at other tools that the community seems to like, such as ISA Tool and Lingua Libre. ISA Tool is a mobile-first website to help users connect pictures uploaded to Wikimedia Commons with the right Wikida ta statements. Lingua Libre is a website that allows users to submit pronunciation of words to Wikimedia Commons in multiple languages.

We then used FigJam to conduct an ideation workshop. The ideation process began with creating a creative moodboard, followed by brainstorming a list of words and phrases related to the moodboard, and then generated a word cloud.

The next step was brainwriting. This is a group brainstorming technique where participants write down their ideas on sticky notes and then share them with the group. The sticky notes were then organized into categories and themes.

The final step was a simple feasibility analysis mapping. We evaluated each idea based on factors such as the innovativeness, scope, resources required, and time constraints.

The ideation process produced three ideas:

  • A mobile-first contribution tool that would make it easier for people to contribute to Wikidata Lexeme.
  • A gadget that would recommend lexemes to add or edit to Wikidata.
  • A learning app based on flash cards that would help people learn languages/lexicography.

After discussing those ideas with Wikimedia Deutschland, we concluded that the mobile-first contribution tool was the most feasible idea. The gadget to recommend lexemes was too complicated on the backend side, and the learning app was too broad of a scope.

In conclusion, the ideation process successfully managed to generate a number of ideas for a new Wikidata tool and to select the most feasible and promising idea.

Prototyping

The next step was to create an initial prototype of the new Wikidata tool. The prototype was created using Figma. It included the following features:

  • A simplified, mobile-first interface, inspired by stacks of playing cards to do simple contributions on the go.
  • Limited scope of contributions that will be expanded as it goes on (for example, antonym of a lexeme).
  • Curated topics by the community (planned).
  • Randomized daily contributions (planned).
  • Support for multiple languages (planned).

After the prototype was shared with Wikimedia Deutschland, the following considerations were discussed:

  • Gamification can be a double-edged sword. We need to carefully consider the kind of motivation we want our users to have. If we simply add elements of gamification to the tool, that can give users the wrong incentives to edit (for example, caring only about the quantity of contribution instead of healthy collaboration).
  • The tool should be accessible to users who don’t use touchscreens or have motor disabilities. This means considering alternate ways to access things without gestures.
  • The tool should be responsive so that it can be used on both mobile and desktop devices.
  • Start by developing the core features of the tool, then gather feedback and add additional features along the way.

Next steps

There are some things that should be done to continue this project:

  • Develop a more detailed minimum viable product (MVP) of the mobile-first contribution tool. This will allow us to test the basic functionality of the tool and get feedback from users.
  • Conduct usability testing with a wider range of participants to get feedback on the design and make improvements. This will help ensure the tool’s ease of use for a variety of users.
  • Develop, launch, and iterate the tool and collect feedback from the community, implementing agile methodology. This will allow us to continuously improve the tool based on user needs.
  • The community should be heavily involved in the tool development process. This will help us maximize impact, be held accountable, and be transparent and open.

Conclusion and personal wishes

Perfect is the enemy of good. Good enough products are enough.

I learned a lot during my time working on this project. I learned how to work with people from different cultures and backgrounds, understand the essence of agile methodology, and the stories behind Wikidata development. I also learned how to connect with different kinds of communities and partners, and how to brainstorm, ideate, and iterate in new ways. I am grateful for the opportunity to work on this project and I am excited to see what the future holds.

I believe that the project can be run as a community effort, can be sustainable, and inspire more people to make well-researched, well-designed products even if the idea sounds simple on paper. Working on a community effort can be challenging, but it can also be incredibly rewarding. When you work with a group of people who are passionate about the same thing, you can achieve great things.

Language is a system to communicate. In other words, it’s a tool to achieve shared understanding. I learned that effective communication is more than just using the right words, but also about using the right context and empathy to whom we’re communicating with.

I would like to invite anyone who is interested in getting involved in the project to read more about this collaboration. I would also love to hear your thoughts on the prototype.

Photo of 7 people associated with Wikimedia Software Collaboration members.

Thank you for reading until the end and here’s for the next 12 months and more for this project🥂!