检测字符串是否是utf8编码

给一个字符串,怎么判断它是什么编码呢?php有一个函数:mb_detect_encoding。不过这个东西需要有mb_string库,不是到处都能用的。今天看手册,在评论里找到了这个东西。
function is_utf8($string) {
   // From http://w3.org/International/questions/qa-forms-utf-8.html
   return preg_match('%^(?:
         [x09x0Ax0Dx20-x7E]            # ASCII
       | [xC2-xDF][x80-xBF]            # non-overlong 2-byte
       |  xE0[xA0-xBF][x80-xBF]        # excluding overlongs
       | [xE1-xECxEExEF][x80-xBF]{2}  # straight 3-byte
       |  xED[x80-x9F][x80-xBF]        # excluding surrogates
       |  xF0[x90-xBF][x80-xBF]{2}    # planes 1-3
       | [xF1-xF3][x80-xBF]{3}          # planes 4-15
       |  xF4[x80-x8F][x80-xBF]{2}    # plane 16
   )*$%xs', $string);  
}
准确率基本和
mb_detect_encoding一样,要对一起对,要错一起错。
编码检测不可能100%准确,这个东西已经可以基本满足要求了。

3 thoughts on “检测字符串是否是utf8编码

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.