Hacker Newsnew | past | comments | ask | show | jobs | submit | piroux's commentslogin

To define something as declarative or imperative, it is important to compare the definition model to the execution model.

So I would rather say that Ansible is much less declarative than Terraform, because Ansible tasks (the different steps of an Ansible Playbook) are executed sequentially.

The tasks of Ansible are its statements, so yeah we would say that each Ansible task is declarative. And still, a requirement for that would be for the task to use a module/role which is idempotent, right? Another proof, Ansible natively offers loop, blocks, and conditional to control the execution flow throughout its tasks.

(This is not a critic of Ansible. I am happy to use it as is, as a high-level scripting mechanism.)


Interested


Except that the Library of Alexandria never actuelly burnt ! That is a very good ol' myth ;)

- https://www.firstthings.com/web-exclusives/2010/06/the-perni...

- https://www.ancientworldmagazine.com/articles/making-myth-li...

- https://history.stackexchange.com/questions/677/what-knowled...

But anyway, no one should delete human littérature, be it inadvertently or by lack of effort.


From Wikipedia: "Scholars have interpreted Cassius Dio's wording to indicate that the fire did not actually destroy the entire Library itself, but rather only a warehouse located near the docks being used by the Library to house scrolls"

If anything this would make the analogy even more apt, since only part of Yahoo is being destroyed. :)

Regardless, it's mostly used as a metaphor for the destruction of knowledge at this point.


Too often historical events turn out to be perfectly true, but claimed to be myths due to dizzying semantic distinctions.

Just looking at the third link, the most upvoted answer agrees that humanity suffered a significant loss of important information. And the 'myth' is just an asinine distinction regarding whether loss was due literally due to fire, or whether the information was lost due to some other cause. I think declaring it a myth in a conversation like this misses the point (it certainly isn't a distinction relevant to the original comparison made here to Yahoo Groups) and just serves to confuse people.


It's quite clear the library is no longer here. How exactly it was lost does matter as its destruction has been used to paint various groups as anti intellectual barbarians since ancient times. Eliminating the story as a weapon to attack others would do humanity some good.


It has been used that way, but not here. Here, it's a disorienting non-sequitur that makes it sound like the information was never really lost.


These articles seem more concerned with detailing how important it is that it wasn't Christians. Makes sense for a organization centered around "religion and public life", I guess. Quite the angle.


It's quite important that it wasn't Christians. A large part of the public understanding of history is based on a belief that progress through the early Middle Ages was held back primarily by Christian repression of free thought. There are people who very seriously believe that we'd be flying between stars by now if Christianity had never become predominant.

You don't have to be a Christian apologist to think that it's important for people understand history correctly.


Do people generally think it was Christians? Without looking it up, I would have said "barbarians", which may not rule out Christians but doesn't specify them either.


I think the majority of people have never thought about it one way or the other (and would probably think similarly to you), but there is a substantial group of people who do. While it's by no means predominant, you come across the idea with fair regularity on atheist discussion boards.


Whoa. I guess what they say is true - say a lie often enough, and it becomes the truth.


Wait the library wasn't lost due to that fire, but the contents were slowly lost due to the passage of time and people not caring or having access to copy it's contents? That makes the analogy way better, but the "burning" part is sadly wrong.


Yes, that is exactly what I wanted to convey by "lack of efforts".

2000 years ago, as a civilization, even if we failed to care enough for the Works stored in the Library, their loss would not have happened if access was not limited, which would have helped in their dissemination and issuing of copies.

Today, as a civilization, if we fail to implement to right process to backup on time what matters to us, we will repeat the same errors as our ancestors.

I guess many historians today would prefer to see those non-existent backups of the Alexandria Library rather than those of Yahoo Groups, but who knows what is more important after all ;)


The main difference is, that then, "backup" ment copying everything by hand, and now, it means one simple copy-paste. Considering the size and price of modern hard drives, and relatively small size of old archives, any one individual can backup a huge amount of data (and even offer/share it as a download link/torrent seed/etc).

Their whole Library would probably fit even on a smallest now-available sd card.



I would be interested to know

- if those projects can be easily used with pandas ...

- and if some of their features are already in pandas ?


anyone ?


I had the same feeling about LLVM until a few weeks ago, when I started tinkering with it during a school project. It let me the time to implement a simple array which can be sliced and copied, through the functions I created :

https://gist.github.com/piroux/a856aa31525ca23238be

It can directly run with lli:

$ lli basics_array.ll

Note: I called it DynArray because I wanted it to grow itself when its nominal capacity would have been reached, but I never took the time to implement this feature ...

For instance, the prototype of the function adding an element could be :

define void @DynArrayI__add(%DynArrayI* %dynarray, i64 %elt)

You might need to add a capacity field to the type of DynArrayI and update others functions, to take it into account.

But according to me it is doable, even for a beginner, because the code is entirely written in LLVM.

So feel free to try !


Here is a first one : What are the best ways to detect changes in html sources with scrapy, thus giving missing data in automatic systems that need to be fed ?


Well, missing data can happen from problems in several different levels:

1) site changes caused the items that were scraped to be incomplete (missing fields) -- for this, one approach is to use an Item Validation Pipeline in Scrapy, perhaps using a JSON schema or something similar, logging errors or rejecting an item if it doesn't pass the validation.

2) site changes caused the scraping the items itself to fail: one solution is to store the sources and monitor the spider errors -- and when there are errors, you can rescrape from the stored sources (it can get a bit expensive store sources for big crawlers). Scrapy doesn't have a complete solution for this out-of-the-box, you have to build your own. You could use the HTTP cache mechanism and build a custom cache policy: http://doc.scrapy.org/en/latest/topics/downloader-middleware...

3) site changed the navigation structure, and the pages to be scraped from were never reached: this is the worst one, it's similar to the previous one, but it's one that you want to detect earlier -- saving the sources doesn't help much, since it happens at an early time during the crawl, so you want to be monitoring it.

One good practice is to split the crawl in two: one spider does the navigation and push the links of the pages to be scraped into a queue or something, and another spider reads the URLs from that and just scrape the data.


Hey, not sure if I understood what you mean. Did you mean:

1) detect pages that had changed since the last crawl, to avoid recrawling pages that hadn't changed? 2) detect pages that have changed their structure, breaking down the Spider that crawl it.


1) detect pages that had changed since the last crawl, to avoid recrawling pages that hadn't changed?

You could use the deltafetch[1] middleware. It ignores requests to pages with items extracted in previous crawls.

2) detect pages that have changed their structure, breaking down the Spider that crawl it.

This is a tough one, since most of the spiders are heavily based on the HTML structure. You could use Spidermon [2] to monitor your spiders. It's available as an addon in the Scrapy Cloud platform [3], and there are plans to open source it in the near future. Also, dealing automatically with pages that change their structure is in the roadmap for Portia [4].

[1] https://github.com/scrapinghub/scrapylib/blob/master/scrapyl...

[2] http://doc.scrapinghub.com/addons.html?highlight=monitoring#...

[3] http://scrapinghub.com/scrapy-cloud/

[4] http://scrapinghub.com/portia/


> 1) detect pages that had changed since the last crawl, to avoid recrawling pages that hadn't changed?

Usually web clients use https://en.wikipedia.org/wiki/HTTP_ETag , afais. If a web app\server lacks that skill, then you could compute your own hash and check it yourself, instead of processing that condition at the network layer.


As someone who does a fair amount of scraping at his job, I'd like to hear what you have to say regarding both questions :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: