Scraping (1)


Removing non-ascii characters from text in Python

I was handling some text scraped using Scrapy and the text had non-ascii unicode charcters like \u003e. If I did this, it didn’t work: html_text = response.text.encode(‘ascii’, errors=’ignore’).decode() Here response.text is the string that contains unicode text (scrapy returns strings encoded in unicode). The html_text still had non ascii unicode characters like \u003e This worked: […]