写在前面
网上找了很多转emoji等方法,大多有两种方法
- 更改数据库编码格式为utf8mb4
- 过滤字符串中的emoji
都不是很优雅
- 更改数据库编码,势必影响其他数据库
- 过滤emoj效率比较低
处理Emoji方式
这里推荐使用org.apache.commons.lang3.StringEscapeUtils工具类,简单等两行代码实现特殊符号和emoji表情的转义存储,和读取反转;
转义存储
- StringEscapeUtils.escapeXXX(content)
它有几种转码方式,可以根据个人格式进行选择:
- public static final String escapeCsv(final String input);
- public static final String escapeEcmaScript(final String input);
- public static final String escapeHtml3(final String input);
- public static final String escapeHtml4(final String input);
- public static final String escapeJava(final String input);
- public static final String escapeJson(final String input);
- public static final String escapeXml(final String input);
- public static String escapeXml10(final String input);
- public static String escapeXml11(final String input)
读取反转义
读取后,根据个人格式进行反转义,即可还原emoji值,供前端展示;
- public static final String unescapeCsv(final String input) ;
- public static final String unescapeEcmaScript(final String input);
- public static final String unescapeHtml3(final String input);
- public static final String unescapeHtml4(final String input);
- public static final String unescapeJava(final String input);
- public static final String unescapeJson(final String input);
- public static final String unescapeXml(final String input);
附加一段手打的复杂代码:
package utils;import java.io.UnsupportedEncodingException;import java.net.URLDecoder;import java.net.URLEncoder;import java.util.regex.Matcher;import java.util.regex.Pattern;import org.apache.commons.lang.StringUtils;public class EmojiUtils { /** * emoji表情替换 * * @param source 原字符串 * * @param slipStr emoji表情替换成的字符串 * * @return 过滤后的字符串 */ public static String filterEmoji(String source, String slipStr) { if (StringUtils.isNotBlank(source)) { return source.replaceAll("[\\ud800\\udc00-\\udbff\\udfff\\ud800-\\udfff]", slipStr); } else { return source; } } /** * @Description 将字符串中的emoji表情转换成可以在utf-8字符集数据库中保存的格式(表情占4个字节,需要utf8mb4字符集) * @param str * 待转换字符串 * @return 转换后字符串 * @throws UnsupportedEncodingException * exception */ public static String emojiConvert(String str) throws UnsupportedEncodingException { String patternString = "([\\x{10000}-\\x{10ffff}\ud800-\udfff])"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(str); StringBuffer sb = new StringBuffer(); while (matcher.find()) { try { matcher.appendReplacement(sb, "[[" + URLEncoder.encode(matcher.group(1), "UTF-8") + "]]"); } catch (UnsupportedEncodingException e) { throw e; } } matcher.appendTail(sb); return sb.toString(); } /** * @Description 还原utf8数据库中保存的含转换后emoji表情的字符串 * @param str * 转换后的字符串 * @return 转换前的字符串 * @throws UnsupportedEncodingException * exception */ public static String emojiRecovery2(String str) throws UnsupportedEncodingException { String patternString = "\\[\\[(.*?)\\]\\]"; Pattern pattern = Pattern.compile(patternString); Matcher matcher = pattern.matcher(str); StringBuffer sb = new StringBuffer(); while (matcher.find()) { try { matcher.appendReplacement(sb, URLDecoder.decode(matcher.group(1), "UTF-8")); } catch (UnsupportedEncodingException e) { throw e; } } matcher.appendTail(sb); return sb.toString(); }}