Orange Workflows

Semantic Word Map

We can find clusters of semantically related words either by hierarchical clustering or t-SNE visualizations. Here, we show a workflow that loads the documents, extracts frequent words, embeds them in a vector space, and explores word clusters.

Tags: Text Mining

Keyword Extraction from a Set of Text Documents

The Extract Keywords widget can characterize a set of textual documents. In this workflow, we load the documents from the server, preprocess them and embed them in the vector space, and display a semantic document map in the t-SNE widget. In this widget, we can select a set of similar documents and then characterize them through keyword extraction. Extract keywords support different inference techniques, including TF-IDF and deep network-based characterization.

Tags: Text Mining

Keyword-Based Text Document Scoring

We can score the text documents based on a list of keywords, say, to find the documents which include the keywords or are semantically related to the list of keywords. This workflow shows the Score Documents widget for scoring and the Word List widget to compose a list of keywords. The scores are visualized in the t-SNE document map.

Tags: Text Mining

Corpus and Word Maps

This workflow shows how to extract the most common words from the documents and observe clusters of semantically similar words with Hierarchical Clustering. We select a group of words (connected to the traffic and roads) and use them to score documents according to selection with the Score Documents widget. The scores are visualized in the document map by the Self-Organizing Maps widget.

Tags: Text Mining

Document Map Annotation

Documents maps can be enhanced with the keywords annotations. This workflow embeds documents in vector space, computes a t-SNE document map and annotates it. The Annotator widget identifies clusters on the map and annotates them with keywords representing a cluster.

Tags: Text Mining

Ontology Generation from Keywords

We can automatically build the otology from the set of words. In the workflow, we select a group of documents with similar content. From the selected documents, we extract keywords and generate a new ontology from the subset of keywords with the Ontology widget.

Tags: Text Mining

Survival Curve Estimation

One of the primary objectives of survival analysis is to estimate the survival probability from observed survival times of different patients. The workflow plots the Kaplan-Meier approximation of the survival curve for the investigated population in the German breast cancer study group. The Kaplan-Meier plot is interactive; we select the longest-surviving patients and use Box Plot to analyze features that best characterize them.


Exploring Survival Features

In the workflow, we show how to find and analyze variables related to survival. We start with variables ranked by univariate Cox regression analysis, where we can select the feature of interest. The Distribution widget shows its distribution and allows us to choose interactively a group of patients related to its values. We compare the survival of this group to all other patients in the Kaplan-Meier plot widget.


Cohort Construction and Validation

Stratification of patients into low and high-risk groups is a common task in survival analysis to identify clinical and biological factors that contribute to survival. One approach to stratification is by computing risk score values based on the Cox regression model. With the clever use of Orange widgets, we can split the data into training and validation sets and then interactively generate risk score models on training data to observe the difference in cohorts' survival rate on training and validation samples side-by-side. Read more on how Apply domain enables this kind of workflows.


Cross Validation for Survival Models

Orange built-in methods for testing and scoring the predictive models now support survival-related models like Cox regression. Here we demonstrate cross-validation to estimate the concordance index for the Cox regression model trained on data instances from selected features.