butterm's comments

butterm · on Nov 8, 2016

Thanks for doing this. I wanted to do the same things a few months back. I looked into a lot of dictionary APIs, but as you mention in the repo, they suck at connecting different parts of speeches. It's funny how simple this sounds but how difficult it is to actually do it. Back then, I gave up and went with a Lemmatizer. Will definitely use this.

butterm · on Oct 30, 2016

I am using elementary OS for the past one and a half years, and honestly speaking, I am in love with it. I am a long time Linux user, and whenever I get a chance, I tell people about all the great things about Linux and why it's better than Windows and the MacOS. But there was always one aspect of Linux that I always felt a bit uneasy about, and this was design/UI/UX/accessibility of the many popular distributions. I liked ubuntu the most in this regards, but after they moved to Unity, I had to ditch it. But then I found Elementary, and have never looked back. Elementary combines the freedom and transparency of open source with beautiful design. I have never seen an OS that scores high on so many different factors.

There's a lot of discussion here about the stock apps in elementary. However, since I am an experienced Linux user, I don't really care about stock apps. The Linux app ecosystem is extremely diverse, and over the years, I have found my best solution for each task. For example, Clementine for music, Atom for text editing etc. So these apps get installed immediately after every installation of elementary and then I never look back at the stock apps again.

Brakenshire · on Oct 30, 2016

Agreed about the stock apps. I think actually they are being too ambitious. They want to have certain interface guidelines which operate across all apps (things like autosaving state so they can be closed and opened seamlessly), and that's why they're so keen on developing a full suite of their own programs. But independently developing so many apps all at the same time is a massive, massive task. In my opinion they would be better off focusing for the moment on areas where there are gaps, for instance making Shotwell / Pantheon Photos or Geary / Pantheon Mail really excellent. It's making the perfect the enemy of the good to redirect their effort to reimplementing their own music app and their own text editor etc, when there are good alternatives.

Agree about the general sentiment, also. In general it's an excellent project.

ddevault · on Oct 31, 2016

Really the best strategy would be to take existing Linux desktop staples and send patches that allow them to be reskinned enough to get the consistent L+F they want.

erikpukinskis · on Oct 31, 2016

The presumption there is that the UI differences are at the skin level, and not tied to behavior, application architecture, toolkit decisions, etc.

The other approach they could take is the Mono/.net approach, which is to have some core domain logic, and then use that as a back-end for multiple independent UIs. A well-designed back-end allows the UIs to be relatively lightweight and hopefully maintainable.

The trouble is, some of what makes "existing Linux desktop staples" is non-trivial UI work. And sometimes adding a back-end front-end split can add a lot of complexity.

In the end, I think the reason we see such division on the question of custom app VS custom skin VS reusable back-end is that there are serious trade-offs in each direction.

achikin · on Oct 30, 2016

Glad that you have the list of best options for every task, but as you say - it took you several years to figure it out. That would be great to have some sane defaults for most of common everyday tasks without the need of long research and learning, so one can concentrate on what he really cares about(like IDE or image editor) rather than bein stuck choosing a media player that err..plays the media.

AdmiralAsshat · on Oct 31, 2016

With all due respect, when have the stock apps ever been good enough on any OS? I spent 20 years using Windows, and the first thing I did upon getting any new desktop/laptop was immediately go and grab a bunch of apps that I needed. They weren't necessarily the "best" apps, either; my old Windows 7 laptop is still running Winamp as my music player because I've used it since 1998 and it still does what I need it to do.

One's preference for programs is really a matter of personal taste. You can try to make the stock programs "suck less", but I doubt you're ever going to really eliminate the need for third party programs. Distros like KaOS (KDE with all-QT apps) have tried, with limited success. The best thing any distro can do, IMO, is to have an extensive App Center and use a flexible packaging format such that it encourages more apps to support that format.

HeavyStorm · on Oct 31, 2016

Still using Winamp as well.

At work, people get awed all the time - "wow, is that Winamp?" :D

And I keep thinking, what the hell are people using these days?

SnowingXIV · on Oct 31, 2016

Spotify, Google Music, Apple Music, Pandora?

Before I had to worry about space and moving my music around. Now it's on every device. Has all my playlists, all the music I listened to + discover new stuff, friends can send me songs I might like, can even download for offline if I'm worried about not having internet, etc.

It's been a much better solution than playing everything in Winamp which I used forever or iTunes.

user5994461 · on Oct 31, 2016

I was about to answer VLC and then I red your answer and realized I'm lagging 10 years behind.

wuschel · on Oct 31, 2016

VLC is one of the best open source programs ever written. I use it on mobile+notebook for video and music (320-encoded files on a 128 GB SD card). Youtube/many other apps are just not OK in quality.

eyko · on Oct 31, 2016

> and whenever I get a chance, I tell people about all the great things about Linux and why it's better than Windows and the MacOS.

People like this bore me to death. Let's be frank - you're just biased and you're probably wrong, despite all your good intentions. What OS you prefer is a matter of taste and of choice, and there's no "better" or "worse" unless you're so narrow minded as to only measure an operating system's worth by the features that happen to put your chosen OS ahead. These days if I was a dedicated gamer I'd probably say Windows. In fact, after the Surface Studio presentation, it's tempting as a creative platform.

I used one form or another of Linux on my main desktops and laptops since 2002, until 2009. In 2009 I bought a Macbook and since then I've switched to OS X (or macOS) as my main OS of choice. I still install GNU coreutils on macOS, and still keep a separate desktop at home with Linux on it - appropriately named `lab` in my home network.

So, in 2016 my main laptop is a Macbook Air with macOS on which I do most of my work, my desktop is running various flavours of linux (arch which i keep the most updated, but also alpine, ubuntu, fedora... etc easily accessible in grub). Don't lecture us on what OS is better, we've made a decision and it doesn't have to be the same as yours.

AsyncAwait · on Oct 31, 2016

> What OS you prefer is a matter of taste and of choice, and there's no "better" or "worse" unless you're so narrow minded as to only measure an operating system's worth by the features that happen to put your chosen OS ahead.

> Don't lecture us on what OS is better, we've made a decision and it doesn't have to be the same as yours.con

I would agree with that but you don't have to be so harsh, he may actually convince people with arguments as to why Linux is actually "better" (i.e. privacy), rather than "lecturing" them.

ilolu · on Oct 31, 2016

I don't think you should be telling him what to do and what not to do, unless he is lecturing you.

butterm · on Oct 29, 2016

Interesting article but that PDF is 39 MB and took an awful lot of time to load on my browser.

butterm · on Oct 28, 2016

Key changes:

* Faster indexing

* Ingest node

* New scripting language (called Painless)

* New data structures from Lucene

* Instant Aggregation

* TF/IDF -> BM25 to calculate relevance

* Fail faster philosophy

* a low level Java HTTP/REST client

butterm · on Oct 20, 2016

What i love about spacy is their dependency parsing visualization tool[0]. Its so much better than what Stanford offers.

Other than that, I find Spacy's philosophy of "one (best) way of doing everything" a bit stifling. I don't think there is a "best" parser or "best" named entity recognizer. A certain parser may perform very well in a domain (for example, Tweeboparser [1] performs well with tweets) and perform very badly in another. This is true for almost everything in NLP, and NLTK embraces this diversity quite well. This is why NLTK is my go to tool when I want to do something cutting edge in NLP.

[0] https://demos.explosion.ai/displacy/ [1] https://github.com/ikekonglp/TweeboParser

syllogism · on Oct 20, 2016

I definitely agree that the same weights won't be optimal for different domains. If you need to parse tweets, you should have a tweet-trained model. The tweet model probably shouldn't be thinking about Jane Austen novels. We want to open a model store where you can buy language and domain specific models.

I think 99% of the time there's one best algorithm, and even one best implementation of it. It's the weights, and sometimes the features, that need to vary.

Finally — I love displaCy too. Ines does great work :). Have you seen that we open-sourced this recently? It's now very easy to run locally, and connect up to the model you're developing. You can use this with any other parser, too. https://explosion.ai/blog/displacy-js-nlp-visualizer

butterm · on Oct 20, 2016

I am so glad that you guys open sourced displaCy. I would love to give it a spin on my system. Kudos for all the great work you are doing!

nl · on Oct 20, 2016

I love the Spacey visualization tool too. You know about http://corenlp.run/ though?

So I do a lot of work with Twitter data - to the point where I have (many) custom Word2Vec models for Twitter data.

Tweeboparser is good, and the NLTK has a basic Twitter tokenizer built in. But I still often end up using Space for stuff anyway. For example, I was building a custom distance metric to explore Tweet clusters, and the Spacy word vectors were fine to get that working.

It's true that I dropped down to using Gensim or Spark's Word2Vec model for some more complex models though.

butterm · on Oct 20, 2016

Yup, I know about that but Displacy is just so much more beautiful.

Also, while NLTK's basic Twitter tokenizer is okay, I find that ARK's tokenizer [0] is much better. Similarly, for POS tagging of tweets, I am using the GATE POS tagger [1]. They have a Stanford model and I can hook it up with NLTK using the StanfordTagger class. In fact, this is the kind of integration that I am missing in Spacy.

[0] https://github.com/myleott/ark-twokenize-py [1] https://gate.ac.uk/wiki/twitter-postagger.html