Shell日志分析常用命令快速入门

Author： admin
发布时间：November 24, 2017
3163 views
No comments
2248 words
Categories： Shell

很多地方分享了日志分析的shell脚本，但是基本没说每个命令符的具体含义，学习成本还是很高，在这里总结下，方便大家快速入门。

1、在Windows下的用户要使用shell命令符的话请先安装cygwin，安装方法自行Google （搜技术问题请使用google）

2、下面粗略介绍下SEO日志分析常用的命令符用法，需要详细了解每个命令符请使用Google。

less 文件名查看文件内容按“q” 退出
cat 文件名打开文件，可以多次打开几个文件 | cat 1.log 2.log |cat *.cat
grep -参数文件名
- -i 不区分大小写
- -v 显示不符合条件的所有行
- -c 显示符合条件的所有行数（符合条件的数量）
egrep 属于grep的升级版，在正则这一块的支持更完善，使用正则的时候建议使用egrep
head -2 文件名显示2行
head -100 文件名 | tail -10 >>a.log 提取文件第91-100行数据
wc -参数文件名统计文本大小，字符多少，行数
- -c 统计文本字节数
- -m 统计文本字符数
- -l 统计文本有多少行
sort - 参数文件名对文件进行排序
- -n 对文件按照数字排序
- -r 反向排序
uniq -参数文件名对文件去重，去重前需要使用排序sort
- -c 显示数据重复的次数
split -参数文件名对文件进行切割
- -100 （每100行切割成一个文件）
- -C 25m/b/k (每25兆/字节/K 分割成一个文件)
| 管道，把上一条命令的结果传输给下一条命令
“>” 和“>> ” 重定向写入文件中 “>”相当于“w”清空并写入 “>>”相当于“a” 追加进文件
awk -F '分割符' Pattern ｛action｝文件名使用指定的字符对每一行数据进行分段，默认是空格（网站日志就是空格分开）
- -F后面跟的是分隔符
- pattern 就是action执行的条件，这里可以使用正则表达式
- $n 即时第几段数据 $0表示整行数据
- NF表示当前记录的字段数
- $NF 表示最后一个字段
- BEGIN和END，这两者都可用于pattern中，提供BEGIN和END的作用是给程序赋予初始状态和在程序结束之后执行一些扫尾的工作
bash shell.sh 运行shell.sh脚本
dos2unix xxoo.sh 将“\r\n”转换成“\n” Windows——>linux （由于Windows和Linux下的换行符不同，所以我们在Windows下面下的代码需要使用dos2unix 转换成Linux下的换行符，否则运行shell脚本会报错）
unix2dos xxoo.sh 将“\n”转换成“\r\n” linux——>Windows
rm xx.txt 删除xx.txt文件

3、一些简单的命令符介绍到这里，需要了解shell，建议大家查看相关的书籍.

下面我们开始使用shell分析日志

日志格式如下：less log.log

Shell分析日志常命令

以下为常用日志分析命令

1、切割百度的抓取数据（将文件切割出来对专门的爬虫数据进行处理能提高效率）

cat log.log |grep -i ‘baiduspider’ > baidu.log

2、网站状态码个数查询

awk ‘{print $9}’ baidu.log | sort | uniq -c | sort -nr

3、百度总抓取量

wc -l baidu.log

4、百度不重复抓取量

awk ‘{print $7}’ baidu.log | sort | uniq | wc -l

5、百度平均每次抓取的数据大小（结果是KB）

awk ‘{print $10}’ baidu.log|awk ‘BEGIN{a=0}{a+=$1}END{ print a/NR/1024}’

6、首页抓取量

awk ‘$7~/.com\/$/’ baidu.log | wc -l

7、某目录抓取量

grep ‘/news/’ baidu.log | wc -l

8、抓取最多的10个页面

awk ‘{print $7}’ baidu.log | sort | uniq -c | sort -nr | head -10

9、找出抓取的404页面

awk ‘$9~ /^404$/ {print $7}’ baidu.log | sort | uniq | sort -nr

10、找出抓取了多少js文件和文件抓取的次数

awk ‘$7~ /.js$/ {print $7}’ baidu.log | sort | uniq -c | sort -nr

Last modification：November 8, 2019

如果觉得我的文章对你有用，请随意赞赏

Shell日志分析常用命令快速入门

admin • 2017 年 11 月 24 日

很多地方分享了日志分析的shell脚本，但是基本没说每个命令符的具体含义，学习成本还是很高，在这里总结下，方便大家快速入门。1、在Windows下的用户要使用shell命令符的话请先安装cygwin，安装方法自行Google （搜技术问题请使用google）2、下面粗略介绍下SEO日志分析常用的命令符用法，需要详细了解每个命令符请使用Google。<ul style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 14px; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px; padding: 0px;">
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">less  文件名       查看文件内容  按“q” 退出<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">cat  文件名      打开文件，可以多次打开几个文件 |     cat 1.log 2.log   |cat *.cat<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">grep -参数  文件名 
<ul class="litype_1" style="word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 14px; padding: 0px;" type="1">
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-i 不区分大小写</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-v 显示不符合条件的所有行</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-c  显示符合条件的所有行数（符合条件的数量）<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
</ul></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">egrep 属于grep的升级版，在正则这一块的支持更完善，使用正则的时候建议使用egrep<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">head -2  文件名 显示2行<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">head -100  文件名  | tail -10 &gt;&gt;a.log   提取文件第91-100行数据<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">wc -参数   文件名      统计文本大小，字符多少，行数 
<ul class="litype_1" style="word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 14px; padding: 0px;" type="1">
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-c 统计文本字节数</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-m 统计文本字符数</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-l 统计文本有多少行<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
</ul></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">sort  - 参数 文件名      对文件进行排序 
<ul class="litype_1" style="word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 14px; padding: 0px;" type="1">
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-n 对文件按照数字排序</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-r 反向排序<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
</ul></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">uniq -参数     文件名    对文件去重，去重前需要使用排序sort 
<ul class="litype_1" style="word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 14px; padding: 0px;" type="1">
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-c  显示数据重复的次数<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
</ul></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">split  -参数  文件名       对文件进行切割 
<ul class="litype_1" style="word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 14px; padding: 0px;" type="1">
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-100   （每100行切割成一个文件）</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-C    25m/b/k   (每25兆/字节/K 分割成一个文件)<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
</ul></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">|    管道，把上一条命令的结果传输给下一条命令<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">“&gt;” 和“&gt;&gt; ” 重定向写入文件中 “&gt;”相当于“w”清空并写入   “&gt;&gt;”相当于“a” 追加进文件<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">awk -F  '分割符'  Pattern ｛action｝  文件名     使用指定的字符对每一行数据进行分段，默认是空格（网站日志就是空格分开） 
<ul class="litype_1" style="word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 14px; padding: 0px;" type="1">
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">-F后面跟的是分隔符</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">pattern 就是action执行的条件，这里可以使用正则表达式</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">$n 即时第几段数据  $0表示整行数据</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">NF表示当前记录的字段数</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">$NF 表示最后一个字段</li>
 	<li style="list-style-type: decimal; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">BEGIN和END，这两者都可用于pattern中，提供BEGIN和END的作用是给程序赋予初始状态和在程序结束之后执行一些扫尾的工作<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
</ul></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">bash shell.sh   运行shell.sh脚本<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">dos2unix   xxoo.sh 将“\r\n”转换成“\n”   Windows——&gt;linux （由于Windows和Linux下的换行符不同，所以我们在Windows下面下的代码需要使用dos2unix 转换成Linux下的换行符，否则运行shell脚本会报错）<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">unix2dos    xxoo.sh 将“\n”转换成“\r\n”  linux——&gt;Windows<br style="word-wrap: break-word; white-space: normal; word-spacing: 0px; text-transform: none; color: #696969; font: 14px/25px 'Helvetica Neue', Helvetica, Arial, sans-serif; widows: 1; letter-spacing: normal; text-indent: 0px; -webkit-text-stroke-width: 0px;" /></li>
 	<li style="list-style-type: disc; word-wrap: break-word; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; margin: 0px 0px 0px 2em; padding: 0px;">rm xx.txt  删除xx.txt文件</li>
</ul>3、一些简单的命令符介绍到这里，需要了解shell，建议大家查看相关的书籍.&nbsp;<h4 id="-shell-">下面我们开始使用shell分析日志</h4>日志格式如下：less log.log<img class="alignnone size-full wp-image-163" src="http://alfred.wang/usr/uploads/2017/11/3.jpg" alt="Shell分析日志常命令" width="573" height="193" style="">以下为常用日志分析命令1、切割百度的抓取数据（将文件切割出来对专门的爬虫数据进行处理能提高效率）<pre class="pure-highlightjs"><code class="null">cat log.log |grep -i ‘baiduspider’ &gt; baidu.log</code></pre>2、网站状态码个数查询<pre class="pure-highlightjs"><code class="null">awk ‘{print $9}’ baidu.log | sort | uniq -c | sort -nr</code></pre>3、百度总抓取量<pre class="pure-highlightjs"><code class="null">wc -l baidu.log</code></pre>4、百度不重复抓取量<pre class="pure-highlightjs"><code class="null">awk ‘{print $7}’ baidu.log | sort | uniq | wc -l</code></pre>5、百度平均每次抓取的数据大小（结果是KB）<pre class="pure-highlightjs"><code class="null">awk ‘{print $10}’ baidu.log|awk ‘BEGIN{a=0}{a+=$1}END{ print a/NR/1024}’</code></pre>6、首页抓取量<pre class="pure-highlightjs"><code class="null">awk ‘$7~/.com\/$/’ baidu.log | wc -l</code></pre>7、某目录抓取量<pre class="pure-highlightjs"><code class="null">grep ‘/news/’ baidu.log | wc -l</code></pre>8、抓取最多的10个页面<pre class="pure-highlightjs"><code class="null">awk ‘{print $7}’ baidu.log | sort | uniq -c | sort -nr | head -10</code></pre>9、找出抓取的404页面<pre class="pure-highlightjs"><code class="null">awk ‘$9~ /^404$/ {print $7}’ baidu.log | sort | uniq | sort -nr</code></pre>10、找出抓取了多少js文件和文件抓取的次数<pre class="pure-highlightjs"><code class="null">awk ‘$7~ /.js$/ {print $7}’ baidu.log | sort | uniq -c | sort -nr</code></pre>

Shell日志分析常用命令快速入门

下面我们开始使用shell分析日志

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

【百度站长平台】网页标题作弊详解

【百度站长平台】百度搜索Mobile Friendly（移动友好度）标准V1.0

【百度站长平台】最新百度移动搜索落地页体验白皮书 - 广告篇2.0

【百度站长平台】移动搜索落地页体验转码&滤镜解读

谷歌“猫头鹰”算法更新

符合百度SEO要求的网站内容建设指南3：用户维护

【大拿分享】新站如何被百度快速收录

《百度移动搜索建站优化白皮书》：网站优化部分

FaceBook账号广告功能受限的原因

【Python】Prettytable安装，排序，制作漂亮的表格

Shell日志分析常用命令快速入门

下面我们开始使用shell分析日志

Leave a Comment Cancel reply 使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款

Shell日志分析常用命令快速入门

Leave a Comment Cancel reply
使用cookie技术保留您的个人信息以便您下次快速评论，继续评论表示您已同意该条款