I was handling some text scraped using Scrapy and the text had non-ascii unicode charcters like
If I did this, it didn’t work:
html_text = response.text.encode('ascii', errors='ignore').decode()
response.text is the string that contains unicode text (scrapy returns strings encoded in unicode).
The html_text still had non ascii unicode characters like
html_text = response.text.encode('ascii', errors='ignore').decode('unicode-escape')
'unicode-escape' part in decode. That made the difference in getting rid of characters like
\u003e and replacing them with space.