Scraping (1)


Removing non-ascii characters from text in Python

I was handling some text scraped using Scrapy and the text had non-ascii unicode charcters like \u003e. If I did this, it didn’t work:

Here response.text is the string that contains unicode text (scrapy returns strings encoded in unicode). The html_text still had non ascii unicode characters like \u003e This worked:

Note that […]