在读取网站源码时发现部分页面读到的内容不完整,浏览器打开正常
1 说明不是人家服务器问题
2 fiddler里打开发现也不完整,而且乱码,但在transformer里设置成 no compression 后也正常。说明读取的东西是完整的,是后续处理的问题
3 c#里调试发现读取的字符串被截断,copy字符串到notepad++里发现被截断的地方有\0\0,原来如此,\0表示字符串结su呢
4.处理程序使用的自动解压缩方法的设置
</>code
- request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip |
- DecompressionMethods.None;
处理的方法如下:
</>code
- try
- {
- strUrl = "http://www.xxx.com";
- CookieContainer cc = new CookieContainer();
- HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl);
- request.Method = "Get";
- request.CookieContainer = cc;
- request.KeepAlive = true;
- request.ContentType = "text/html";
- request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36";
- request.Headers.Add("x-requested-with:XMLHttpRequest");
- request.Headers.Add(HttpRequestHeader.AcceptLanguage, "zh-CN,zh;q=0.8,en;q=0.6,nl;q=0.4,zh-TW;q=0.2");
- request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
- request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip |
- DecompressionMethods.None;
- request.Headers.Add("Accept-Encoding", "gzip, deflate");
- if (request.Method == "POST")
- {
- (request as HttpWebRequest).ContentType = "application/x-www-form-urlencoded";
- }
- HttpWebResponse response = (HttpWebResponse)request.GetResponse();
- //StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312"));
- StreamReader reader = new StreamReader(response.GetResponseStream(), encoder);
- strMsg = reader.ReadToEnd();
- // .\0为null,空字符,也是字符串结束标志
- strMsg = strMsg.Replace("\0", "");
- reader.Close();
- reader.Dispose();
- response.Close();
- }
- catch
- { }
如对本文有疑问,请提交到交流论坛,广大热心网友会为你解答!! 点击进入论坛