Android – How to get plain HTML using evaluateJavascript from Webview? JSOUP not able to parse the result HTML

You should use JsonReader to parse the value:

webView.evaluateJavascript("(function() {return document.getElementsByTagName('html')[0].outerHTML;})();", new ValueCallback<String>() {
    @Override
    public void onReceiveValue(final String value) {
        JsonReader reader = new JsonReader(new StringReader(value));
        reader.setLenient(true);
        try {
            if(reader.peek() == JsonToken.STRING) {
                String domStr = reader.nextString();
                if(domStr != null) {
                    handleResponseSuccessByBody(domStr);
                }
            }
        } catch (IOException e) {
            // handle exception
        } finally {
            IoUtil.close(reader);
        }
}

});

for remove the UTFCharacthers use this function:

 public static StringBuffer removeUTFCharacters(String data) {
        Pattern p = Pattern.compile("\\u(\p{XDigit}{4})");
        Matcher m = p.matcher(data);
        StringBuffer buf = new StringBuffer(data.length());
        while (m.find()) {
            String ch = String.valueOf((char) Integer.parseInt(m.group(1), 16));
            m.appendReplacement(buf, Matcher.quoteReplacement(ch));
        }
        m.appendTail(buf);
        return buf;
    }

and call it inside the onReceiveValue(String html) like this:

@Override
public void onReceiveValue(String html) {
String result = removeUTFCharacters(html).toString();
}

You will obtain a string with clean html.

Bye,
Alex

try this

v=StringEscapeUtils.unescapeJavaScript(v.substring(1,v.length()-1));

unescapeJavaScript is from apache commons-lang

So many string processing for android webview, why…
The removeUTFCharacters method provided in the previous answer is not clean enough.There still remain stuffs like ".

Leave a Comment