By: AJDA, Oct 5, 2018
In the past couple of weeks we have been working hard on introducing a better language support for the Text add-on. Until recently, Orange supported only a limited number of languages, mostly English and some bigger languages, such as Spanish, German, Arabic, Russian… Language support was most evident in the list of stopwords, normalization and POS tagging. Related: Text Workshops in Ljubljana Stopwords come from NLTK library, so we can only offer whatever is available there.
By: AJDA, Jun 19, 2017
In data mining, preprocessing is key. And in text mining, it is the key and the door. In other words, it’s the most vital step in the analysis. Related: Text Mining add-on So what does preprocessing do? Let’s have a look at an example. Place Corpus widget from Text add-on on the canvas. Open it and load Grimm-tales-selected. As always, first have a quick glance of the data in Corpus Viewer.
By: AJDA, Jan 13, 2017
We’ve said it numerous times and we’re going to say it again. Data preparation is crucial for any data analysis. If your data is messy, there’s no way you can make sense of it, let alone a computer. Computers are great at handling large, even enormous data sets, speedy computing and recognizing patterns. But they fail miserably if you give them the wrong input. Also some classification methods work better with binary values, other with continuous, so it is important to know how to treat your data properly.
By: AJDA, Nov 30, 2016
Being a political scientist, I did not even hear about data mining before I’ve joined Biolab. And naturally, as with all good things, data mining started to grow on me. Give me some data, connect a bunch of widgets and see the magic happen! But hold on! There are still many social scientists out there who haven’t yet heard about the wonderful world of data mining, text mining and machine learning.
By: AJDA, Sep 23, 2016
Orange3-Text has just recently been polished, updated and enhanced! Our GSoC student Alexey has helped us greatly to achieve another milestone in Orange development and release the latest 0.2.0 version of our text mining add-on. The new release, which is already available on PyPi, includes Wikipedia and SimHash widgets and a rehaul of Bag of Words, Topic Modeling and Corpus Viewer. Wikipedia widget allows retrieving sources from Wikipedia API and can handle multiple queries.
By: AJDA, Jul 5, 2016
Google Summer of Code is progressing nicely and some major improvements are already live! Our students have been working hard and today we’re thanking Alexey for his work on Text Mining add-on. Two major tasks before the midterms were to introduce Twitter widget and rehaul Preprocess Text. Twitter widget was designed to be a part of our summer school program and it worked beautifully. We’ve introduced youngsters to the world of data mining through social networks and one of the most exciting things was to see whether we can predict the author from the tweet content.