Documentary about the Internet Archive
By Jonathan Minard, John Behrens, Alexander Porter, and Fearghal O’dea
Internet Archive from Deepspeed media on Vimeo.
By Jonathan Minard, John Behrens, Alexander Porter, and Fearghal O’dea
Internet Archive from Deepspeed media on Vimeo.
There will be a memorial for Aaron at the Internet Archive on Thrusday, January 24.
The flag at the Internet Archive is flying at half mast in memory of Aaron:

Tymm shared his bash function for assisting with virtualenvs. I’ve been using it instead of virtualenvwrapper. Typing `pye` will list your virtualenvs, and typing `pye envname` will activate one of them:
# Enable a python virtualenv function pye() { if [[ -z "${1}" ]]; then echo -e "\x1b[01;34mAvailable virtualenvs:\x1b[00m" (cd ~/pyenvs && for i in *; do echo -e "\x1b[01;36m ${i} \x1b[00m"; done) else . ~/pyenvs/"${1}"/bin/activate; fi } |
The Project Gutenberg Distributed Proofreaders has developed a font that helps you find OCR mistakes. What a great idea!
Here is how to use python and lxml to parse web pages with unicode characters, encoded as utf-8. It would be nice if lxml.html.parse(url) could correctly use the Content-Type HTTP header, but it doesn’t, so you have to tell lxml what encoding to use.
>>> import lxml.etree >>> url = 'http://hi.wikipedia.org/wiki/मुखपृष्ठ' #utf-8 encoded bytes >>> url 'http://hi.wikipedia.org/wiki/\xe0\xa4\xae\xe0\xa5\x81\xe0\xa4\x96\xe0\xa4\xaa\xe0\xa5\x83\xe0\xa4\xb7\xe0\xa5\x8d\xe0\xa4\xa0' >>> utf8_html_parser = lxml.etree.HTMLParser(encoding='utf-8') >>> page = lxml.etree.parse(url, parser=utf8_html_parser) >>> print page.find('head/title').text विकिपीडिया >>> page.find('head/title').text u'\u0935\u093f\u0915\u093f\u092a\u0940\u0921\u093f\u092f\u093e' |
http://www.spur.org/publications/library/article/carnegie-libraries-san-francisco
“After 244 years, the Encyclopaedia Britannica is going out of print.” http://mediadecoder.blogs.nytimes.com/2012/03/13/after-244-years-encyclopaedia-britannica-stops-the-presses/