Removing non-ascii characters from text in Python

I was handling some text scraped using Scrapy and the text had non-ascii unicode charcters like \u003e.
If I did this, it didn’t work:

Here response.text is the string that contains unicode text (scrapy returns strings encoded in unicode).
The html_text still had non ascii unicode characters like \u003e
This worked:

Note that 'unicode-escape' part in decode. That made the difference in getting rid of characters like \u003e and replacing them with space.

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *