php怎么截取中文字符串_php教程

在php中截取字符串最简单的办法就是利用substr()函数来实现,但是substr函数只能截取英文,如果是中文就会是乱码,那么有朋友说可使用mb_substr()来截取,这个方法又不能截取中文英混合的字符。

此函数用于截取gb2312编码的中文字符串,代码如下:

<?php

// 说明：截取中文字符串

function mysubstr($str, $start, $len) {

    $tmpstr = "";

    $strlen = $start + $len;

    for($i = 0; $i < $strlen; $i++) {

        if(ord(substr($str, $i, 1)) > 0xa0) {

            $tmpstr .= substr($str, $i, 2);

            $i++;

        } else

            $tmpstr .= substr($str, $i, 1);

    }

    return $tmpstr;

}

?>

Utf-8、gb2312都支持的汉字截取函数,截取utf-8字符串函数.

为了支持多语言,数据库里的字符串可能保存为UTF-8编码,在网站开发中可能需要用php截取字符串的一部分,为了避免出现乱码现象,编写如下的UTF-8字符串截取函数

UTF-8编码的字符可能由1~3个字节组成,具体数目可以由第一个字节判断出来,理论上可能更长,但这里假设不超过3个字节

第一个字节大于224的，它与它之后的2个字节一起组成一个UTF-8字符,第一个字节大于192小于224的，它与它之后的1个字节组成一个UTF-8字符,否则第一个字节本身就是一个英文字符(包括数字和一小部分标点符号).

代码如下:

<?php

// 说明：Utf-8、gb2312都支持的汉字截取函数



/*

Utf-8、gb2312都支持的汉字截取函数

cut_str(字符串, 截取长度, 开始长度, 编码);

编码默认为 utf-8

开始长度默认为 0

*/



function cut_str($string, $sublen, $start = 0, $code = 'UTF-8')

{

    if($code == 'UTF-8')

    {

        $pa = "/[x01-x7f]|[xc2-xdf][x80-xbf]|xe0[xa0-xbf][x80-xbf]|[xe1-xef][x80-xbf][x80-xbf]|xf0[x90-xbf][x80-xbf][x80-xbf]|[xf1-xf7][x80-xbf][x80-xbf][x80-xbf]/";

        preg_match_all($pa, $string, $t_string);



        if(count($t_string[0]) - $start > $sublen) return join('', array_slice($t_string[0], $start, $sublen))."...";

        return join('', array_slice($t_string[0], $start, $sublen));

    }

    else

    {

        $start = $start*2;

        $sublen = $sublen*2;

        $strlen = strlen($string);

        $tmpstr = '';



        for($i=0; $i<$strlen; $i++)

        {

            if($i>=$start && $i<($start+$sublen))

            {

                if(ord(substr($string, $i, 1))>129)

                {

                    $tmpstr.= substr($string, $i, 2);

                }

                else

                {

                    $tmpstr.= substr($string, $i, 1);

                }

            }

            if(ord(substr($string, $i, 1))>129) $i++;

        }

        if(strlen($tmpstr)<$strlen ) $tmpstr.= "...";

        return $tmpstr;

    }

}



$str = "abcd需要截取的字符串";

echo cut_str($str, 8, 0, 'gb2312');

?>

代码如下:

function utf8Substr($str, $from, $len)

{

    return preg_replace('#^(?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$from.'}'.

                       '((?:[x00-x7F]|[xC0-xFF][x80-xBF]+){0,'.$len.'}).*#s',

                       '$1',$str);

}

可单独截取uft8字符串。

程序说明：

1. len 参数以中文字符为标准，1len等于2个英文字符，为了形式上好看些

2. 如果将magic参数设为false，则中文和英文同等看待，取绝对的字符数

3. 特别适用于用htmlspecialchars()进行过编码的字符串

4. 能正确处理GB2312中实体字符模式

程序代码：

function FSubstr($title,$start,$len="",$magic=true)

{

/**

* powered by Smartpig

* mailto:d.einstein@263.net

*/

$length = 0;

if($len == "") $len = strlen($title);

//判断起始为不正确位置

if($start > 0)

{

$cnum = 0;

for($i=0;$i<$start;$i++)

{

if(ord(substr($title,$i,1)) >= 128) $cnum ++;

}

if($cnum%2 != 0) $start--;

unset($cnum);

}

if(strlen($title)<=$len) return substr($title,$start,$len);

$alen = 0;

$blen = 0;

$realnum = 0;

for($i=$start;$i<strlen($title);$i++)

{

$ctype = 0;

$cstep = 0;

$cur = substr($title,$i,1);

if($cur == "&")

{

if(substr($title,$i,4) == "<")

{

$cstep = 4;

$length += 4;

$i += 3;

$realnum ++;

if($magic)

{

$alen ++;

}

}

else if(substr($title,$i,4) == ">")

{

$cstep = 4;

$length += 4;

$i += 3;

$realnum ++;

if($magic)

{

$alen ++;

}

}

else if(substr($title,$i,5) == "&")

{

$cstep = 5;

$length += 5;

$i += 4;

$realnum ++;

if($magic)

{

$alen ++;

}

}

else if(substr($title,$i,6) == """)

{

$cstep = 6;

$length += 6;

$i += 5;

$realnum ++;

if($magic)

{

$alen ++;

}

}

else if(substr($title,$i,6) == "'")

{

$cstep = 6;

$length += 6;

$i += 5;

$realnum ++;

if($magic)

{

$alen ++;

}

}

else if(preg_match("/&#(d+);/i",substr($title,$i,8),$match))

{

$cstep = strlen($match[0]);

$length += strlen($match[0]);

$i += strlen($match[0])-1;

$realnum ++;

if($magic)

{

$blen ++;

$ctype = 1;

}

}

}else{

if(ord($cur)>=128)

{

$cstep = 2;

$length += 2;

$i += 1;

$realnum ++;

if($magic)

{

$blen ++;

$ctype = 1;

}

}else{

$cstep = 1;

$length +=1;

$realnum ++;

if($magic)

{

$alen++;

}

}

}

if($magic)

{

if(($blen*2+$alen) == ($len*2)) break;

if(($blen*2+$alen) == ($len*2+1))

{

if($ctype == 1)

{

$length -= $cstep;

break;

}else{

break;

}

}

}else{

if($realnum == $len) break;

}

}

unset($cur);

unset($alen);

unset($blen);

unset($realnum);

unset($ctype);

unset($cstep);

return substr($title,$start,$length);

}

(责任编辑：admin)

热门搜索:

php怎么截取中文字符串