shell脚本日志收集与打包

Posted on 2016-12-26

线上日志收集是很正常的业务，一般由运营来处理，现在给一段代码来实现日志的收集，打包以及上传到s3备份存储，请看代码:

#!/bin/sh
function send_sms()
{
    . /data/se_scripts/conf/setting_new.conf
    phone=$phone_number
    phone=$(echo $phone |sed 's/\n//g')
    phone=$(echo $phone |sed 's/\r//g')
    sh /data/se_scripts/conf/send_msg_new.sh ${phone} 701 "upload $1 failed" &
}
function exit_if_timeout()
{
    SLEEP_SECONDS=$1
    CMD=$2
    #set -x
    $CMD &
    pidApp=$!
    ( sleep $SLEEP_SECONDS ; kill $pidApp ) &
    killerPid=$!
    wait $pidApp
    status=$?
    (kill -0 $killerPid && kill $killerPid) || true
    #set +x
    return $status
}
function exec_cmd_with_retry() 
{
    cmd=$1
    cur_retry=$2
    while [ $cur_retry -gt 0 ]; do
        $cmd
        ret=$?
        if [ $ret -eq 0 ]; then
            return
        fi
        ((cur_retry -= 1))
        sleep 10
    done
  
    send_sms "Failed to exec ${cmd} IP:${local_ip}"
    exit 1;
}
function upload_s3path()
{
    retry_cnt=$1
    src_path=$2
    file=$3
    arr=(${file//./ })
    day=${file:0:8}
    hour=${file:8:2}
    minute=${file:10:2}
    exit_if_timeout 600 "aws s3 cp --profile $s3_profile $src_path/$file $s3_path/$day/$hour/$minute/$file.$local_ip.log" 
    ret=$?
    if [ $ret -eq 0 ] ; then
        rm -rf $src_path/$file
    fi
    if [ $ret -eq 0 ] && [ "${file}" != "${ln_file}" ]; then
        send_sms "$src_path/$file uploads3 retry ok ${local_ip}"
    elif [ $ret -ne 0 ] && [ $retry_cnt -eq 0 ]; then
        send_sms "$src_path/$file uploads3 failed ${local_ip}"
    fi
}
function upload_files()
{
    for f in `ls $ln_path`
    do
        upload_s3path $1 $ln_path $f
    done
}
function upload_empty()
{
    for f in `ls $lock_touch`
    do
        upload_s3path $1 $lock_touch $f
    done
}
function upload_logs()
{
    cur_retry=5
    while [ $cur_retry -gt 0 ]; do
        upload_empty $cur_retry
        upload_files $cur_retry
        ((cur_retry -= 1))
        sleep 10
    done 
}
#down 机 肯定不会有原始日志，所以不检查原始日志
function check_empty()
{
    for f in `ls $lock_touch`
    do
        check=`ls $ln_path | grep  $f | grep -v grep | wc -l`
        if [ $check -ne 0 ]; then
            rm -rf $lock_touch/$f
        fi
    done
}
function check_pid()
{
    if [ ! -d ${lock_pid} ];then
        mkdir -p $lock_pid
        echo $$ > $lock_pid/PID
        return
    fi
    check=0
    running=`cat $lock_pid/PID`
    if [ ! -z $running  ]; then
        check=`ps -ef | grep  $running | grep -v grep | wc -l`
    fi
    if [ "$check" -eq "1" ]; then
        exit
    fi
    echo $$ > $lock_pid/PID
}
#args
log_path=$1
log_file=$2
s3_path=$3
ln_path=$4
ln_file=$6
s3_profile=$7
lock_pid=$5/pid
lock_touch=$5/empty
local_ip=`cat /var/tmp/ifconfigme`
#soft link
exec_cmd_with_retry "ln -s $log_path/$log_file $ln_path/$ln_file" 5
check_pid
check_empty
upload_logs
rm -rf $lock_pid

基本上已经够用了，需要定义自己的变量就好！

MerryChristmas

Posted on 2016-12-25

Merry Christmas！送给我，也送给我未过门的妻子，今天也是我的生日，祝我生日快乐！！

今天和兰兰，在小区门口的香辣居吃了一条烤鱼，味道不错，只是价格稍微贵了点，然后我们去了附近的沃尔玛，虽然兰兰忘记了我的生日，但是我没有怪罪她，只要她在我身边，陪着我，我已经足够了，可是她还是很想给我买一个蛋糕，说毕竟是生日，没有蛋糕咋行，我说我不喜欢吃蛋糕，想作罢，但是她还是很认真的一定要给我买一个，我说那就买一个mini的就好，然后我们去了沃尔玛，选了一个10块钱的蛋糕，4英寸，很实惠，兰兰问我晚上吃面好好，我就知道她是想我健康，长寿，因为她希望我活的比她长，她害怕失去我，一个人很孤独，我还是应了她，我说好啊，然后我们来了虾仁，面条，还有西红柿，鸡蛋家里有，也就没买了，我们回到家，她有点累，然后回家休息！我们一起开始吃蛋糕，然后兰兰说，许个愿吧，我说许愿做什么，最后我还是许个愿，因为这个愿望我想不仅仅属于我，也属于她，希望我的2017会有一个不一样的自己，生活！

我们在一起争吵了许多次，磨合了许多次，也伤感了许多次，但是这些都没能让我们真正的放弃彼此，而是更让我们珍惜彼此！爱，如此的奇妙，因为爱，才有那份征服自己和生活的勇气，因为爱，才不让自己变得没有任何责任心，因为爱，北京的雾霾也不会阻止我们相见！好久没有抒情了，还记得在学校文学社写抒情诗的自己，那么的沉迷，以至于不能自我！可惜，没能长久的保持那份坚持，现在看技术方面的书籍比较多点，都忘了如何去表达！

希望自己的2017，有勇气，有智慧，变得更聪明，更好驾驭自己，走向生活，为爱而生，也为我那可爱的，嫌弃我，陪伴我的人儿，追寻更多的幸福！hello world！hello，2017!

关于线上问题

Posted on 2016-12-22

关于线上问题，是真正能够让我们成长的最好的方式，所以需要不断的积累，特别是大数据量时引起的一系列问题，包括人为的和线上的：

1.个人行为造成的问题，某同学想修改下数据表字段的类型，修改了一个几百万数据量的表，修改过程中会导致SQL阻塞，进一步导致请求阻塞，以至于整个系统请求平均延迟上升，产生了蝴蝶效应。如果量级更大的话，影响时间会更长，以至于整个系统不可用
总结：
线上的大表，因为都是基于innodb存储引起，修改引起的代价是比较大的，可能会锁表，索引尽可能的少执行这种操作，需要做好测试评估好执行影响时间

2.应用场景引发的问题，对于直播产品，视频直播结束，会向观众发送直播已经结束的消息，而这种消息是同时发送的，这样就会引起大量的数据产生和消费，造成的影响是，服务器消息堆积，客户端不知道直播是否结束，客户端仍然还在聊天室
总结：
在一定的时间内，比如1min, 离散的发送消息，让客户端能够在有限的时间内退出聊天室，不至于影响产品的用户体验

3.redis错误，read error on connection

1	PHP Fatal error: Uncaught exception 'RedisException' with message 'read error on connection' in ....

这个错误是偶尔遇到，有同学通过修改php soctet超时时间来修复该问题

1	default_socket_timeout = 60

或者在脚本中添加

1	ini_set('default_socket_timeout', -1); //不超时

来解决该问题,但是这种解决方式并不是最好，而且这种方式需要重启php-fpm进程才可以生效

4.服务配置引起的问题，新开一个elb,支持cloudfront CDN支持，针对某个地区，将xxx.xxx.net这个域名，解析到cloudfront地址上，经由cloudfront再到后端elb进行访问，出现服务访问不了的情况，经运维协查，是cloude front服务配置问题导致
总结：
线上配置，需要谨慎修改，支持各种环境（线上，开发，测试，联调）发布部署配置文件的管理

5.设计问题，redis大key如何解，有时候对于好友关系等出现长尾数据的时候，存和读都会遇到问题，一般对大key的读，使用分页读，如果存储的数据量过大，1000w,那么做数据迁移还是数据恢复就会有问题，所以，单key的数据量的存储，不应过大
总结：
大key,可以使用应用层对id进行hash，生成统一前缀的key，自动实现key的查找

6.高并发，大数据引起最初设计的变化的问题，随着用户量的增加，uv上升，导致之前的设计请求量增加，系统压力增大，这样需要在原先的设计上，增加一些策略，当然，这种策略的增加需要产品的支持，比如消息发送，对任何用户等级之间的用户发送私信，当大网红直播时，这样导致消息量剧增，如果同时直播数比较多，产生的消息量会非常大，这样，可以增加一些策略来减少这种压力，比如只有达到某个等级的用户，在直播的时候，才对其粉丝发送私信，这样就大大的减少了消息量的产生，当然也可以在技术上支持

7.减少存储系统的压力，对于给予协助处理服务程序的数据，比如算法，这种服务的特性是非实时的，或者接近实时，对于这种数据的存储，不需要放到内存中存储，可以使用kafka来存储，将算法服务接入该系统，获取数据，并将数据的结果写入前端服务器或者内存中，供前端服务使用

8.线上日志，一种为数据日志，提供分析平台使用，而另外一种属于业务日志，供问题定位，对于业务日志，需要做到能准确定位所发生的问题，用户线上问题排查

9.无状态设计和有状态设计，无状态设计，比如静态文件，或者使用已有的信息，能拿到所需要的静态资源数据信息，怎样区权衡是使用有状态设计，还是使用无状态设计，还是有无状态设计相结合，需要考虑到对系统性能的影响，有多大的影响，增加有状态设计会产生什么多大的存储资源的损耗？是否是热点数据，是否需要将数据分多次多状态存储和维护？如果数据量不大，而且是基于热点，那么最好还是使用有状态设计，不会产生太多的资源损耗问题，性能还是实现的逻辑也是可以控制的

10.系统架构设计，使用ha来实现各节点之间活跃监控，增强系统的容错处理和可靠性

11.redis并发症控制，可以使用setnx来替换set

go基础

Posted on 2016-12-20

初涉这门语言，该语言以其高性能并发编程而被程序员们所喜欢，虽然鄙人还没涉足如何很好的去运用这门语言来支持现有的产品性能提升，只是抱着学习的态度，了解一门新的语言有哪些特性，可以为我所用，仅此，写下此篇，来记录自己学习历程！

go语言哲学

少即是多，你理解得越好，你将变得越简洁！这里的少和多主要是针对GO和C、C++之间的比较，下面看看Go做了些什么：

常规的语法(不需要一个符号表来辅助解析)
GC机制(仅仅是GC)
没有头文件
显式依赖关系
没有循环依赖
数字常量仅仅是数字(没有类型)
int和int32不是同种类型
字母大小写将确定可见性
任何类型都可以有方法(没有类)
没有子类型继承(没有子类)
包级别的初始化和良好定义的初始化顺序
同一个包的文件一起编译
包级别的全局定义可以以任意顺序进行
没有算术类型转换(常量可以弥补)
接口是隐式实现的(没有”implements”声明)
嵌入的结构体(没有类型提升和子类)
方法像函数一样定义(不必定义在特殊的地方)
方法就是函数
接口就是方法(没有数据)
方法仅仅靠名字匹配(不是靠类型)
没有构造函数和析构函数
后置增量/减量运算符仅仅是语句，而不是表达式
没有前置增量/减量运算符
赋值号是语句，不是表达式
表达式求值顺序在赋值和函数调用时确定(没有所谓的”sequence point”)
没有指针算术
内存总是初始化为0
对本地变量取地址是合法的
方法中没有叫this的指针
分段式栈
没有常量或其它类型的注记
没有模板
没有异常
内建字符串，切片和映射(map)
数组边界检查

Doug Mcilroy，Unix管道的最终发明人，在1964年(!)写道:

1	我们应该有一些机制能将程序耦合(串)起来，像花园软管那样——当我们需要另一种方式传送数据时，拧紧另外一段即可。 I/O也可以这么做。

如果C++和Java注重类型继承和类型系统的分类学，那么Go就注重组合，是一门关于(功能上的)组合和(调用上的)耦合的语言。写这些，只是为了便于理解go语言是什么，便于更快的理解go语言

go语言特性

1.自动垃圾回收
2.更丰富的内置类型
3.函数多返回值
4.错误处理
5.匿名函数和闭包
6.类型和接口
7.并发编程
8.反射
9.语言交互性

….以后补充….

go语言初步

…..以后补充…..

go语言开源项目

1.cache2go
https://github.com/muesli/cache2go
比较简单的一个缓存库，代码量很少，适合新手学习，可以学习到锁、goroutines等

2.groupcache
https://github.com/golang/groupcache
与memcached同一作者，相当于是memcached的go语言实现

3.nsq
https://github.com/bitly/nsq
消息分发平台，阅读代码可以了解到很多分布式、负载均衡等方面的编程

4.docker
https://github.com/docker/docker
成为了高手后可以研究下其实现

5.go语言开发工具集
https://github.com/golang/go/wiki/Projects

linux的10个命令

Posted on 2016-12-11

先看下这个10个命令：
uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top

下面讲解下这个10个命令都是用来做什么的，怎么用

uptime

1 2	chenjingxiu:~ kivmi$ uptime 00:23上午 up 6 days 1:53, 3 users, load average: 1.96, 1.70, 1.70

uptime – show how long system has been running
DESCRIPTION
The uptime utility displays the current time, the length of time the system has been up, the number of users, and the load average of the system over the
last 1, 5, and 15 minutes.
该命令用来查看系统的运行时间和负载情况，分别对应以下列
当前时间启动运行时间系统启动时间当前系统用户数系统最近1分钟、5分钟、15分钟系统的负载情况

dmesg | tail

chenjingxiu:~ kivmi$ sudo  dmesg | tail
ARPT: 243830.149501: DequeueTime: 0x1906f048 LastTxTime: 0x1ff947d1 PHYTxErr:   0x0000 RxAckRSSI: 0x0000 RxAckSQ: 0x0000
ARPT: 243830.149527: Raw[0]    1 Valid
ARPT: 243830.149536: [2]    0 IM
ARPT: 243830.149541: [3]    1 PM
ARPT: 243830.149545: [7-4]  0 Suppr
ARPT: 243830.149550: [14:8] 1 Ncons
ARPT: 243830.149555: [15]   0 Acked
ARPT: 243830.149561: txpktpend AC_BK 0 AC_BE 0 AC_VI 0 AC_VO 1 BCMC 0 ATIM 0
SmartBattery: finished polling type 4
UserEventAgent is not entitledUserEventAgent is not entitledloginwindow is not entitledloginwindow is not entitledUserEventAgent is not entitledUserEventAgent is not entitledloginwindow is not entitledloginwindow is not entitledUserEventAgent is not entitledUserEventAgent is not entitledloginwindow is not entitledloginwindow is not entitledSmartBattery: finished polling type 4

dmesg – display the system message buffer
DESCRIPTION
Dmesg displays the contents of the system message buffer. This command needs to be run as root.
该命令用来显示linux内核的环形缓冲区(kernel ring buffer)信息,从中获得诸如系统架构、cpu、挂载的硬件，RAM等多个运行级别的大量的系统信息,当计算机启动时，系统内核（操作系统的核心部分）将会被加载到内存中,在加载的过程中会显示很多的信息，在这些信息中我们可以看到内核检测硬件设备，主要是硬件检测，包括磁盘sda,hda,usb等

vmstat 2 1
每两秒采集一次服务器的状态 2 时间间隔 1 次数

[chenjingxiu@op ~]$ vmstat 1  3
procs    -----------memory----------  ---swap-- -----io---- --system-- -----cpu-----
r  b    swpd   free   buff  cache     si   so    bi    bo   in   cs us sy id wa st
0  0      0 5564792 408260 20643368    0    0     0    27    1    1  1  2 97  0  0
0  0      0 5562248 408260 20643372    0    0     0     0 2609 3087  1  2 98  0  0
6  0      0 5530268 408260 20643380    0    0     0     0 4354 4287  4  6 90  0  0

NAME
vmstat - Report virtual memory statistics

SYNOPSIS
vmstat [-a] [-n] [-t] [-S unit] [delay [ count ]]
vmstat [-s] [-n] [-S unit]
vmstat [-m] [-n] [delay [ count ]]
vmstat [-d] [-n] [delay [ count ]]
vmstat [-p disk partition] [-n] [delay [ count ]]
vmstat [-f]
vmstat [-V]

DESCRIPTION
vmstat reports information about processes, memory, paging, block IO, traps, and cpu activity.The first report produced gives averages since the last reboot. Additional reports give information on a sampling period of length delay. The process and memory
reports are instantaneous in either case.

DESCRIPTION FOR VM MODE

Procs
r: The number of processes waiting for run time.(在运行队列中等待cpu进程数)
b: The number of processes in uninterruptible sleep.(等待I/O的进程数)

Memory
swpd: the amount of virtual memory used.(切换到内存交换区的内存数量,如果swpd比较大，但是si,so一直为0，系统性能也是正常的)
free: the amount of idle memory.
buff: the amount of memory used as buffers.(一般对块设备的读写进行缓存)
cache: the amount of memory used as cache.(page cache数量，一般作为文件系统的缓存，如果cache比较大，说明用到cache的文件比较多，如果I/O的bi比较小，说明文件系统效率比较好)
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)

Swap
si: Amount of memory swapped in from disk (/s).(从磁盘交换进内存的页数量)
so: Amount of memory swapped to disk (/s).(从内存交换到磁盘的页数量)

IO
bi: Blocks received from a block device (blocks/s).(从磁盘读到内存的块数)
bo: Blocks sent to a block device (blocks/s).(从内存写入磁盘的块数)

System
in: The number of interrupts per second, including the clock.(每秒设备中断数)
cs: The number of context switches per second.(每秒产生的上下文切换次数，cs正常情况下，应小于I/O的包传输速率)

CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time) (用户进程占用cpu的时间比)
sy: Time spent running kernel code. (system time) (系统进程占用cpu的时间比)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.(中央处理器空闲时间占用比)
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

FIELD DESCRIPTION FOR DISK MODE

Reads
total: Total reads completed successfully
merged: grouped reads (resulting in one I/O)
sectors: Sectors read successfully
ms: milliseconds spent reading

Writes
total: Total writes completed successfully
merged: grouped writes (resulting in one I/O)
sectors: Sectors written successfully
ms: milliseconds spent writing

IO
cur: I/O in progress
s: seconds spent for I/O
ms: milliseconds spent writing

IO
cur: I/O in progress
s: seconds spent for I/O

FIELD DESCRIPTION FOR DISK PARTITION MODE

reads: Total number of reads issued to this partition read sectors: Total read sectors for partition
writes : Total number of writes issued to this partition
requested writes: Total number of write requests made for partition

FIELD DESCRIPTION FOR SLAB MODE

cache: Cache name
num: Number of currently active objects
total: Total number of available objects
size: Size of each object
pages: Number of pages with at least one active object
totpages: Total number of allocated pages
pslab: Number of pages per slab

如果r经常大于4 ，且id经常小于40，表示中央处理器的负荷很重。如果bi，bo 长期不等于0，表示物理内存容量太小

mpstat -P ALL 1

[chenjingxiu@op ~]$ mpstat -P ALL 1
Linux 2.6.32-279.19.1.el6.centos.plus.x86_64 (op.liebaopay.com)         12/11/2016      _x86_64_        (8 CPU)
06:49:57 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:49:58 AM  all    0.50    0.00    1.00    0.50    0.00    0.00    0.00    0.00   97.99
06:49:58 AM    0    1.01    0.00    1.01    1.01    0.00    0.00    0.00    0.00   96.97
06:49:58 AM    1    1.00    0.00    1.00    3.00    0.00    0.00    0.00    0.00   95.00
06:49:58 AM    2    0.00    0.00    2.02    0.00    0.00    0.00    0.00    0.00   97.98
06:49:58 AM    3    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00   98.02
06:49:58 AM    4    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00   98.02
06:49:58 AM    5    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00
06:49:58 AM    6    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00
06:49:58 AM    7    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00
06:49:58 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
06:49:59 AM  all    0.50    0.00    1.51    0.00    0.00    0.00    0.00    0.00   97.99
06:49:59 AM    0    2.97    0.00    5.94    0.00    0.00    0.00    0.00    0.00   91.09
06:49:59 AM    1    0.00    0.00    1.02    0.00    0.00    0.00    0.00    0.00   98.98
06:49:59 AM    2    0.99    0.00    1.98    0.00    0.00    0.00    0.00    0.00   97.03
06:49:59 AM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
06:49:59 AM    4    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
06:49:59 AM    5    0.00    0.00    1.01    0.00    0.00    0.00    0.00    0.00   98.99
06:49:59 AM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
06:49:59 AM    7    0.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   99.00

NAME
mpstat - Report processors related statistics.

DESCRIPTION
The mpstat command writes to standard output activities for each available processor, processor 0 being the first one. Global average activities among all processors are also reported. The mpstat command can be used both on SMP and UP machines, but in the latter, only global average activities will be printed. If no activity has been selected, then the default report is the CPU utilization report.
The interval parameter specifies the amount of time in seconds between each report. A value of 0 (or no parameters at all) indicates that processors statistics are to be reported for the time since system startup (boot). The count parameter can be specified in conjunction with the interval parameter if this one is not set to zero. The value of count determines the number of reports generated at interval seconds apart. If the interval parameter is speci-fied without the count parameter, the mpstat command generates reports continuously.

OPTIONS
-A This option is equivalent to specifying -I ALL -u -P ALL
-P { cpu [,…] | ON | ALL }
Indicate the processor number for which statistics are to be reported. cpu is the processor number. Note that processor 0 is the first processor.The ON keyword indicates that statistics are to be reported for every online processor, whereas the ALL keyword indicates that statistics are to be reported for all processors.
-u Report CPU utilization. The following values are displayed:

CPU Processor number. The keyword all indicates that statistics are calculated as averages among all processors.
%usr Show the percentage of CPU utilization that occurred while executing at the user level (application).
%nice Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
%sys Show the percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts.
%iowait Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
%irq Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.
%soft Show the percentage of time spent by the CPU or CPUs to service software interrupts.
%steal Show  the  percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual proces-sor.
%guest Show the percentage of time spent by the CPU or CPUs to run a virtual processor.
%idle Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

ipdstat 1

[chenjingxiu@op ~]$ pidstat 1
Linux 2.6.32-279.19.1.el6.centos.plus.x86_64 (op.liebaopay.com)         12/11/2016      _x86_64_        (8 CPU)
07:01:34 AM       PID    %usr %system  %guest    %CPU   CPU  Command
07:01:35 AM      3294    0.00    0.84    0.00    0.84     3  python2.7
07:01:35 AM     14733    0.84    0.00    0.00    0.84     0  sshd
07:01:35 AM     15959    4.20   11.76    0.00   15.97     0  pidstat
07:01:35 AM     30156    0.00    0.84    0.00    0.84     1  tmux
07:01:35 AM       PID    %usr %system  %guest    %CPU   CPU  Command
07:01:36 AM        36    0.00    1.00    0.00    1.00     1  events/1
07:01:36 AM     14733    0.00    1.00    0.00    1.00     0  sshd
07:01:36 AM     15959    5.00   13.00    0.00   18.00     0  pidstat
07:01:36 AM     29962    1.00    0.00    0.00    1.00     0  python

NAME
pidstat - Report statistics for Linux tasks.

DESCRIPTION
The pidstat command is used for monitoring individual tasks currently being managed by the Linux kernel. It writes to standard output activities for every task selected with option -p or for every task managed by the Linux kernel if option -p ALL has been used. Not selecting any tasks is equivalent to speci-fying -p ALL but only active tasks (tasks with non-zero statistics values) will appear in the report.

The pidstat command can also be used for monitoring the child processes of selected tasks. Read about option -T below.

The interval parameter specifies the amount of time in seconds between each report. A value of 0 (or no parameters at all) indicates that tasks statistics are to be reported for the time since system startup (boot). The count parameter can be specified in conjunction with the interval parameter if this one is not set to zero. The value of count determines the number of reports generated at interval seconds apart. If the interval parameter is specified without the count parameter, the pidstat command generates reports continuously.

You can select information about specific task activities using flags. Not specifying any flags selects only CPU activity.

EXAMPLES

pidstat 2 5
Display five reports of CPU statistics for every active task in the system at two second intervals.

pidstat -r -p 1643 2 5
Display five reports of page faults and memory statistics for PID 1643 at two second intervals.

pidstat -T CHILD -r 2 5
Display five reports of page faults statistics at two second intervals for the child processes of all tasks in the system. Only child processes with non-zero statistics values are displayed.

iostat -xz 1
iostat主要用于监控系统设备的IO负载情况

chenjingxiu:~ kivmi$ iostat -d -K 1 10
disk0               disk2               disk3 
KB/t  tps  MB/s     KB/t  tps  MB/s     KB/t  tps  MB/s 
52.04    6  0.31    14.14    0  0.00    14.03    0  0.00 
4.00    9  0.04     0.00    0  0.00     0.00    0  0.00 
6.67    3  0.02     0.00    0  0.00     0.00    0  0.00 
4.00    1  0.00     0.00    0  0.00     0.00    0  0.00 
394.00    2  0.77     0.00    0  0.00     0.00    0  0.00 
[chenjingxiu@op ~]$ iostat -xz 1
Linux 2.6.32-279.19.1.el6.centos.plus.x86_64 (op.liebaopay.com)         12/11/2016      _x86_64_        (8 CPU)
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
1.08    0.00    1.92    0.13    0.01   96.86
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.09    0.03    7.72     0.89    62.49     8.18     0.05    6.19   1.23   0.95
xvdi              0.00     9.52    0.03    6.74     2.13   130.09    19.52     0.09   12.73   0.51   0.34
xvdb              0.00     0.00    0.01    0.12     0.12     9.44    73.96     0.04  304.96   2.07   0.03
xvdf              0.00     0.01    0.06    3.31     3.25   229.37    69.16     0.03    8.61   0.17   0.06
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.38    0.00    0.75    0.25    0.13   98.50
Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    1.00    0.00    16.00     0.00    16.00     0.02   17.00  17.00   1.70

NAME
iostat - Report Central Processing Unit (CPU) statistics and input/output statistics for devices, partitions and network filesystems (NFS).

DESCRIPTION
The iostat command is used for monitoring system input/output device loading by observing the time the devices are active in relation to their average transfer rates. The iostat command generates reports that can be used to change system configuration to better balance the input/output load between physical disks.

The first report generated by the iostat command provides statistics concerning the time since the system was booted. Each subsequent report covers the time since the previous report. All statistics are reported each time the iostat command is run. The report consists of a CPU header row followed by a row of CPU statistics. On multiprocessor systems, CPU statistics are calculated system-wide as averages among all processors. A device header row is displayed followed by a line of statistics for each device that is configured. When option -n is used, an NFS header row is displayed followed by a line of statistics for each network filesystem that is mounted.

The interval parameter specifies the amount of time in seconds between each report. The first report contains statistics for the time since system startup (boot). Each subsequent report contains statistics collected during the interval since the previous report. The count parameter can be specified in conjunction with the interval parameter. If the count parameter is specified, the value of count determines the number of reports generated at interval seconds apart. If the interval parameter is specified without the count parameter, the iostat command generates reports continuously.

REPORTS
The iostat command generates three types of reports, the CPU Utilization report, the Device Utilization report and the Network Filesystem report.

CPU Utilization Report
The first report generated by the iostat command is the CPU Utilization Report. For multiprocessor systems, the CPU values are global averages among all processors. The report has the following format:
%user
Show the percentage of CPU utilization that occurred while executing at the user level (application).

%nice
Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.

%system
Show the percentage of CPU utilization that occurred while executing at the system level (kernel).

%iowait
Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.

%steal
Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.

%idle
Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

Device Utilization Report
The second report generated by the iostat command is the Device Utilization Report. The device report provides statistics on a per physical device or par-tition basis. Block devices for which statistics are to be displayed may be entered on the command line. Partitions may also be entered on the command line providing that option -x is not used. If no device nor partition is entered, then statistics are displayed for every device used by the system, and providing that the kernel maintains statistics for it. If the ALL keyword is given on the command line, then statistics are displayed for every device defined by the system, including those that have never been used. The report may show the following fields, depending on the flags used.

tps: 每秒传输的次数，一次传输即一次I/O请求（多个逻辑请求可以合并为一次I/O请求)
kB_read/s: 每秒从设备读取的数据量
kB_wrtn/s: 每秒向设备写入的数据量
kB_read: 读取的总数据量
kB_wrtn: 写入的总数据量
rrqm/s：每秒这个设备相关的读取请求有多少被Merge了
wrqm/s：每秒这个设备相关的写入请求有多少被Merge了
rsec/s：每秒读取的扇区数
wsec/：每秒写入的扇区数
rKB/s：The number of read requests that were issued to the device per second
wKB/s：The number of write requests that were issued to the device per second
avgrq-sz：平均请求扇区的大小
avgqu-sz：平均请求队列的长度
await：每一个IO请求的处理的平均时间（单位是微秒毫秒)，即IO的响应时间,一般情况下，await大于svctm，两者差值越小，则说明队列时间越短，反之差值越大，队列时间越长,说明系统出问题了，await一般低于5ms
svctm：表示平均每次设备I/O操作的服务时间（以毫秒为单位)，如果svctm的值与await很接近，表示几乎没有I/O等待，磁盘性能很好，如果await的值远高于svctm的值，则表示I/O队列等待太长，系统上运行的应用程序将变慢
%util： 在统计时间内所有处理IO时间，除以总共统计时间，预示设备的繁忙程度，如果该参数是100%表示设备已经接近满负荷运行了

Network Filesystem report
The Network Filesystem (NFS) report provides statistics for each mounted network filesystem. The report shows the following fields

OPTIONS
-c Display the CPU utilization report.
-d Display the device utilization report.(显示磁盘的使用状态)
-K 对块设备，强制使用Kilobytes为单位
-x Display extended statistics. This option works with post 2.5 kernels since it needs /proc/diskstats file or a mounted sysfs to get the statistics. This option may also work with older kernels (e.g. 2.4) only if extended statistics are available in /proc/partitions (the kernel needs to be patched for that).(显示I/O相关的扩展数据)
-n Display the network filesystem (NFS) report. This option works only with kernel 2.6.17 and later.
-p [ { device [,…] | ALL } ]
The -p option displays statistics for block devices and all their partitions that are used by the system. If a device name is entered on the command line, then statistics for it and all its partitions are displayed. Last, the ALL keyword indicates that statistics have to be displayed for all the block devices and partitions defined by the system, including those that have never been used. Note that this option works only with post 2.5 kernels.

EXAMPLES

iostat
Display a single history since boot report for all CPU and Devices.
iostat -d 2
Display a continuous device report at two second intervals.
iostat -d 2 6
Display six reports at two second intervals for all devices.
iostat -x hda hdb 2 6
Display six reports of extended statistics at two second intervals for devices hda and hdb.
iostat -p sda 2 6
Display six reports at two second intervals for device sda and all its partitions (sda1, etc.)

主要看await和%util的值，await是否大于5ms，%util是否大于80%

free -m

[chenjingxiu@op ~]$ free -m 
total       used       free     shared    buffers     cached
Mem:         30100      24655       5445          0        398      20166
-/+ buffers/cache:       4089      26010
Swap:            0          0          0

NAME
free - Display amount of free and used memory in the system
DESCRIPTION
free displays the total amount of free and used physical and swap memory in the system, as well as the buffers used by the kernel. The shared memory column should be ignored; it is obsolete.

sar -n DEV 1

[chenjingxiu@op ~]$ sar -n DEV 1
Linux 2.6.32-279.19.1.el6.centos.plus.x86_64 (op.liebaopay.com)         12/11/2016      _x86_64_        (8 CPU)
07:26:36 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
07:26:37 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:26:37 AM      eth0    150.51    152.53     29.42     21.03      0.00      0.00      0.00
07:26:37 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
07:26:38 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
07:26:38 AM      eth0    107.00    103.00     24.87     14.36      0.00      0.00      0.00

NAME
sar - Collect, report, or save system activity information.

DESCRIPTION
The sar command writes to standard output the contents of selected cumulative activity counters in the operating system. The accounting system, based on the values in the count and interval parameters, writes information the specified number of times spaced at the specified intervals in seconds. If the interval parameter is set to zero, the sar command displays the average statistics for the time since the system was started. If the interval parameter is specified without the count parameter, then reports are generated continuously. The collected data can also be saved in the file specified by the -o file-name flag, in addition to being displayed onto the screen. If filename is omitted, sar uses the standard system activity daily data file, the /var/log/sa/sadd file, where the dd parameter indicates the current day. By default all the data available from the kernel are saved in the data file.

The sar command extracts and writes to standard output records previously saved in a file. This file can be either the one specified by the -f flag or, by default, the standard system activity daily data file.

Without the -P flag, the sar command reports system-wide (global among all processors) statistics, which are calculated as averages for values expressed as percentages, and as sums otherwise. If the -P flag is given, the sar command reports activity which relates to the specified processor or processors. If -P ALL is given, the sar command reports statistics for each individual processor and global statistics among all processors.

You can select information about specific system activities using flags. Not specifying any flags selects only CPU activity. Specifying the -A flag is equivalent to specifying -bBdqrRSvwWy -I SUM -I XALL -n ALL -u ALL -P ALL.

The default version of the sar command (CPU utilization report) might be one of the first facilities the user runs to begin system activity investigation,because it monitors major system resources. If CPU utilization is near 100 percent (user + nice + system), the workload sampled is CPU-bound.If multiple samples and multiple reports are desired, it is convenient to specify an output file for the sar command. Run the sar command as a background process. The syntax for this is:

sar -o datafile interval count >/dev/null 2>&1 &

All data is captured in binary form and saved to a file (datafile).  The data can then be selectively displayed with the sar command using the  -f  option.Set the interval and count parameters to select count records at interval second intervals. If the count parameter is not set, all the records saved in the file will be selected.  Collection of data in this manner is useful to characterize system usage over a period of time and determine peak usage hours.

Indicate the number of transfers per second that were issued to the device. Multiple logical requests can be combined into a single I/O
request to the device. A transfer is of indeterminate size.

rd_sec/s
Number of sectors read from the device. The size of a sector is 512 bytes.

wr_sec/s
Number of sectors written to the device. The size of a sector is 512 bytes.

avgrq-sz
The average size (in sectors) of the requests that were issued to the device.

avgqu-sz
The average queue length of the requests that were issued to the device.

await
The average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue
and the time spent servicing them.

svctm
The average service time (in milliseconds) for I/O requests that were issued to the device.

%util
Percentage of CPU time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs
when this value is close to 100%.

sar -n TCP,ETCP 1

[chenjingxiu@op ~]$  sar -n TCP,ETCP 1
Linux 2.6.32-279.19.1.el6.centos.plus.x86_64 (op.liebaopay.com)         12/11/2016      _x86_64_        (8 CPU)
07:38:52 AM  active/s passive/s    iseg/s    oseg/s
07:38:53 AM      0.00      0.00    122.22    114.14
07:38:52 AM  atmptf/s  estres/s retrans/s isegerr/s   orsts/s
07:38:53 AM      0.00      0.00      0.00      0.00      0.00

top

top - 07:40:00 up 47 days, 23:57, 122 users,  load average: 2.01, 2.03, 2.06
Tasks: 2185 total,   1 running, 2174 sleeping,   8 stopped,   2 zombie
Cpu(s):  0.9%us,  2.2%sy,  0.0%ni, 96.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  30822952k total, 25263368k used,  5559584k free,   408320k buffers
Swap:        0k total,        0k used,        0k free, 20652508k cached
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                
11028 chenjing  20   0 18732 2928  972 R  3.2  0.0   0:00.48 top                                                                                                                     
3294 ansible   20   0  635m  49m 2928 S  1.3  0.2 525:49.41 python2.7

NAME
top - display Linux tasks

a: PID – Process Id
The task’s unique process ID, which periodically wraps, though never restarting at zero.

b: PPID – Parent Process Pid
The process ID of a task’s parent.

c: RUSER – Real User Name
The real user name of the task’s owner.

d: UID – User Id
The effective user ID of the task’s owner.

e: USER – User Name
The effective user name of the task’s owner.

f: GROUP – Group Name
The effective group name of the task’s owner.

g: TTY – Controlling Tty
The name of the controlling terminal. This is usually the device (serial port, pty, etc.) from which the process was started, and which it uses for input or output. However, a task need not be associated with a terminal, in which case you’ll see ’?’ displayed.

h: PR – Priority
The priority of the task.

i: NI – Nice value
The nice value of the task. A negative nice value means higher priority, whereas a positive nice value means lower priority. Zero in this field simply means priority will not be adjusted in determining a task’s dispatchability.

j: P – Last used CPU (SMP)
A number representing the last used processor. In a true SMP environment this will likely change frequently since the kernel intentionally uses weak affinity. Also, the very act of running top may break this weak affinity and cause more processes to change CPUs more often (because of the extra demand for cpu time).

k: %CPU – CPU usage
The task’s share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time. In a true SMP environment, if ’Irixmode’ is Off, top will operate in ’Solaris mode’ where a task’s cpu usage will be divided by the total number of CPUs. You toggle ’Irix/Solaris’ modes with the ’I’ interactive command.

us 用户空间占用CPU百分比
sy 内核空间占用cpu百分比
ni 用户空间内改变优先级的进程占用cpu百分比
id cpu空闲占用百分比
wa 等待输入输出的cpu占用百分比
hi 硬件cpu中断占用百分比
si 软中断占用百分比
st 虚拟机占用百分比

l: TIME – CPU Time 进程使用cpu时间总计,单位秒
Total CPU time the task has used since it started. When ’Cumulative mode’ is On, each process is listed with the cpu time that it and its dead children has used. You toggle ’Cumulative mode’ with ’S’, which is a command-line option and an interactive command. See the ’S’ interactive command for addi-tional information regarding this mode.

m: TIME+ – CPU Time, hundredths 进程使用cpu时间总计, 单位1/100秒
The same as ’TIME’, but reflecting more granularity through hundredths of a second.

n: %MEM – Memory usage (RES) 使用物理内存占用比
A task’s currently used share of available physical memory.

o: VIRT – Virtual Image (kb)虚拟内存使用量
The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out. (Note: you can define the STATSIZE=1 environment variable and the VIRT will be calculated from the /proc/#/state VmSize field.)

p: SWAP – Swapped size (kb) 使用虚拟内存，被换出的大小
Per-process swap values are now taken from /proc/#/status VmSwap field.

q: RES – Resident size (kb) 进程使用的，未被还出的物理内存大小
The non-swapped physical memory a task has used.
RES = CODE + DATA.

r: CODE – Code size (kb) 可执行的代码占用物理内存的大小
The amount of physical memory devoted to executable code, also known as the ’text resident set’ size or TRS.

s: DATA – Data+Stack size (kb) 可执行代码以外的部分（数据段+棧)占用物理内存的大小
The amount of physical memory devoted to other than executable code, also known as the ’data resident set’ size or DRS.

t: SHR – Shared Mem size (kb) 共享内存大小
The amount of shared memory used by a task. It simply reflects memory that could be potentially shared with other processes.

u: nFLT – Page Fault count 页面错误次数
The number of major page faults that have occurred for a task. A page fault occurs when a process attempts to read from or write to a virtual page that

v: nDRT – Dirty Pages count 最后一次写入，未被修改的的页面数
The number of pages that have been modified since they were last written to disk. Dirty pages must be written to disk before the corresponding physical memory location can be used for some other virtual page.

w: S – Process Status 进程状态
The status of the task which can be one of:
’D’ = uninterruptible sleep()
’R’ = running
’S’ = sleeping
’T’ = traced or stopped
’Z’ = zombie(僵尸进程数)
Tasks shown as running should be more properly thought of as ’ready to run’ – their task_struct is simply represented on the Linux run-queue. Even without a true SMP machine, you may see numerous tasks in this state depending on top’s delay interval and nice value.

x: Command – Command line or Program name 命令名
Display the command line used to start a task or the name of the associated program. You toggle between command line and name with ’c’, which is both a command-line option and an interactive command.When you’ve chosen to display command lines, processes without a command line (like kernel threads) will be shown with only the program name in paren-theses, as in this example:
( mdrecoveryd )
Either form of display is subject to potential truncation if it’s too long to fit in this field’s current width. That width depends upon other fields selected, their order and the current screen width.
Note: The ’Command’ field/column is unique, in that it is not fixed-width. When displayed, this column will be allocated all remaining screen width (up to the maximum 512 characters) to provide for the potential growth of program names into command lines.

y: WCHAN – Sleeping in Function 睡眠状态的进程系统函数名
Depending on the availability of the kernel link map (’System.map’), this field will show the name or the address of the kernel function in which the task is currently sleeping. Running tasks will display a dash (’-’) in this column.
Note: By displaying this field, top’s own working set will be increased by over 700Kb. Your only means of reducing that overhead will be to stop and restart top.

z: Flags – Task Flags 任务标志
This column represents the task’s current scheduling flags which are expressed in hexadecimal notation and with zeros suppressed. These flags are offi-cially documented in . Less formal documentation can also be found on the ’Fields select’ and ’Order fields’ screens.

以上均摘录自手册，供大家参考，使用!

关于一个接口的设计

Posted on 2016-12-10

对最近的工作做一个总结和反思，最近在处理消息分片的工作，先介绍下应用的场景：
我们在做一个直播的产品，直播中会存在很多消息，这样就需要对消息进行处理，然后回放的时候，会用到这些存储的文件，我们目前使用的第三方的聊天室系统，需要对第三方的消息同步到我们的系统存储起来，一方面便于查询问题，因为消息内容中除了第三方的自定义的协议之外，也会涉及到我们业务数据的上报；另外一方面，不需要依赖第三方服务，对自己的数据管控。在改版之前，第三方会在直播结束，会回调我们的接口，进行消息的同步，写入异步队列，实现串行处理，之前每个直播消息，会生成独立的一个文件，放到云存储s3，由于消息文件过大，导致有些回播在下载消息文件的过程中，加载时间太慢，甚至导致客户端app直接崩溃，为此，之前有人改过一版，只是对于消息过于大的文件，进行了不处理，也就是完全的抛弃掉，回播不会存在聊天消息，这种处理太过于武断，没有从根本上解决问题

在处理这个问题的过程中，遇到了一些问题
之前的处理方式：
将消息文件从redis落地，将消息文件存放于磁盘，然后用shell脚本实现同步到s3

方案一
我没有修改既有的这种方式，甚至没有对这种方式产生任何的质疑，于是做了如下的设计

1.新增接口，提供消息分片的文件列表，需要服务端存储某个视频的消息分片的文件的列表
2.仍然使用既有使用shell脚本同步消息文件的方案

这样的设计有什么问题呢？
1.浪费了存储空间，这些消息文件是静态的文件，生成之后就不会变，所以并不需要把消息分片文件列表存放于redis中
2.消息文件的同步，将消息文件存放于本地磁盘，然后使用shell脚本同步，而且同步进程使用的单进程，实践证明，这种方式并不是最好的，由于产生的文件的数量比较多，导致消费脚本处理不过来，产生消息文件的积压

方案二
1.做好对旧版本的兼容，对接口进行修改，提供消息文件的索引文件，索引文件是消息分片文件上传到s3之后的文件列表
2.不基于shell脚本处理，基于s3提供的sdk，完全使用同步的方式进行处理，即不将redis中的消息内容落地处理，不生成消息文件
3.动态生成消息分片文件的时候，同时生成消息索引文件，并同步到s3

索引文件：

[
{
  timestamp: "1481188775520",
  url: "http://test.s3.amazonaws.com/cheetahlive/15/33/14811887487207113059/14811887487207113059_0.json"
}
{
  timestamp: "1481188775520",
  url: "http://test.s3.amazonaws.com/cheetahlive/15/33/14811887487207113059/14811887487207113059_1.json"
}
...
]

这样，既解决了服务端的存储问题，不用再存储消息分片文件了，直接客户端通过接口请求，拿到消息索引文件，通过解析消息索引文件，得到所需要的消息索引文件列表，同时解决了消息文件存放于存盘，产生的消费能力跟不上导致的文件积压的问题，如果文件比较多，还可以对回调任务处理脚本使用多进程的方式，不用考虑到是否会产生多个进程同时消费一个消息文件的问题，完美支持多进程，这样设计似乎完美了吧？

真的完美么？

方案三
在方案二的基础上做了以下的修改
方案二会产生什么问题呢？
1.客户端要下载消息文件，需要先请求索引文件接口，下载并解析索引文件，获取消息分片文件列表，客户端同学会说，你的接口太难用了得改啊
2.流量问题，移动端应用，最需要考虑的问题之一，就是能省流量
如何更友好的处理这些问题呢？
1.简化接口，还是使用索引列表的方式返回，但是处理方式跟方案一不同的地方是，服务端请求索引文件，解析并通过接口返回
2.生成的消息文件使用gz压缩，当然这个压缩需要s3支持，开启s3的gz压缩模式，减少消息的传输

到此，也算是比较完美的了，不知道还没有比较好的方案来处理这个问题？希望能得到大家的交流！

总结：
1.在接口升级的过程中需要考虑到客户端新老版本的兼容性
2.对于同步处理还是异步处理的选择，根据特定的业务场景进行选择，考虑有没有必要使用异步处理，不是所有场景都适合使用异步处理的方式
3.文件存储，考虑磁盘的使用文件，目录划分
4.消费模型，必须需要考虑生产者和消息者之间能力匹配问题，不然会产生积压
5.对于数据的处理方式和存储方式，不是所有的数据都需要使用内存或者数据库存储，可以使用云存储，结合cdn技术，更好的管理静态文件
6.对于考虑一个方案的时候，需要做更多的可能性测试，通过测试来选取最佳方案
7.在思考方案的时候，需要从头到尾的考虑整个的数据的处理过程，可能是一个回路，只是变更一种数据获取方式，所以考虑的时候，尽可能的考虑每一个数据上处理的环节

人因为梦想而伟大，产品因为一个更好的技术方案而解放我们，多思考细节的东西，才能挖掘别人看不到的东西，一切变化，都因为在变！！

一个shell脚本

Posted on 2016-12-06

工作中有时候需要使用shell来做一些文件处理，但是在使用shell脚本之前，先确定是否使用该方案来处理，以下是我一个位上线的脚本

###############################
# 消息进程管理                                    #
#                                                        #
# Author:  chenjx                               #
# Since:   2016-12-05                       #
# version: 1.0                                    #
###############################
OLD_DIR='/data/logs/message/msg/'
NEW_DIR='/data/logs/message/newMsg/'
SHELL_CMD='/bin/bash'
CURRENT_DIR=`pwd`
SHELL_SCRIPT="${CURRENT_DIR}/pushmsgtos3one.sh"
LOG_FILE='/data/logs/message/process_log.txt'
PROCESS_NUM_LIST=`ps aux|grep 'pushmsgtos3one.sh'|awk -F ' ' '{if($13){print $13}}'`
DIRECTORY_SEPARATOR='/'
#检查进程
checkNewProcess(){
    RET=1
    if [ -d $1 -a  -x ${SHELL_SCRIPT} ]
    then 
        for CHILD_DIR in `ls $1`
        do 
            if echo "${PROCESS_NUM_LIST[@]}" | grep -w ${CHILD_DIR}
            then 
                RET=0
            else 
                ${SHELL_CMD} ${SHELL_SCRIPT} $1${CHILD_DIR}${DIRECTORY_SEPARATOR}  &
                if [ $? -ne '0' ]
                then 
                    echo  '> NEW:' ${SHELL_SCRIPT} ${CHILD_DIR} ' Error' >> ${LOG_FILE} 
                fi
            fi
        done  
    else 
        echo ${OLD_DIR} ${NEW_DIR}' Check  Error' >> ${LOG_FILE}
    fi
    return $RET
}
#兼容老版本
checkOldProcess(){
    RET=1
    if [ -d $1 -a  -x ${SHELL_SCRIPT} ]
    then 
        ${SHELL_CMD} ${SHELL_SCRIPT} $1  &
        if [ $? -ne '0' ]
        then 
            echo  '> OLD:' ${SHELL_SCRIPT} ${CHILD_DIR} ' Error' >> ${LOG_FILE} 
        fi
    fi
    return $RET
}
checkOldProcess ${OLD_DIR}
checkNewProcess ${NEW_DIR}

只是方案进行了修改，不使用这种方式来处理，由于需要清理分支，所以记录下来！

算法

Posted on 2016-12-03

矩阵求素数

#include<stdio.h>
#include<stdlib.h>             
void  makeMatrix(int len,int data[]){ 
    int i,j;                   
    //初始化矩阵               
    for(i =2;i<len;i++){       
        data[i] = 1;           
    }
    for(i =2; i<len; i++){     
        if(data[i]){           
            for(j=i; i*j < len; j++){       
                data[i*j] = 0; 
            }
        }                      
    }
} 
int main(int argc, char **argv){
    int N = atol(argv[1]);
    int *data = malloc(N * sizeof(int));
    if(data == NULL){
        printf("Insufficient memory.\n");
        return -1;
    }                                                                                                                                                                              
    makeMatrix(N,data);
    for(int i =2 ; i < N; i++){
        if(data[i]){    
             printf("%4d ", i);              
        }
    }
    printf("\n");
    return 0;
}

该方法使用0和1，通过构建数组的方式，使用动态内存分配的方式，来求素数

二分查找

问题：一个有序序列，如何找到绝对值最小的元素

#include<stdio.h>
#include<stdlib.h>
int binSearch(int data[], int low, int high)
{
   int mid = low + (high - low) / 2;
   if(low > high) {
       return -1;
   }
   if(data[low] < 0 && data[high] < 0) {
       return high;
   }
   if(data[low] > 0) {
       return low;
   }
   while(1) {
        if(data[mid-1] < 0 && data[mid+1] > 0){
           return mid;
        }
        if(data[mid] > 0){
           high = mid;
        }else{
           low = mid;
        }
        mid = low + (high - low) / 2;
   }
   return -1;
}
void dumpAbsMin(int data[], int low, int high)
{
    int mid = binSearch(data, low, high);
    if (mid < 0) {
                                                                                                                             1,1          顶端
  printf("an error happend\n");
       return;
    }else{
       printf("%d %d \n", mid, data[mid]);
    }
    int left = mid, right = mid;
    while(abs(data[right]) == abs(data[right+1])){
        right++;
        printf("%d %d \n", right, data[right]);
    }
    while(abs(data[left]) == abs(data[left-1])){
        left--;
        printf("%d %d \n", left, data[left]);
    }
}
int main(int argv, char **argc){
  int data[] = {-9,-8,-3,-2,-1,-1,-1,-1,1,1,1,1,3,6,19};
  dumpAbsMin(data, 0, 8);
  return 0;
}

输出:

root@chenjingxiu:~/project/algorithm# ./d
7 -1 
8 1 
9 1 
10 1 
11 1 
6 -1 
5 -1 
4 -1

输出了所有的绝对值等于1的元素，但是这个问题有哪些需要注意的呢？

1. 整形溢出. 如果使用mid = (high + low) / 2计算mid，当high+low超过了整形的范畴，就会溢出;
2. 数据序列. 序列是否都是正数，负数，还是正负数都有，针对不同的情况，处理方式就不一样
3. 多个绝对值相等元素. 如果序列中存在多个绝对值相等的元素如何处理
4. 如果找到正负之间的元素，通过其左右两边的元素的正负来判定，这是重点

c语言中的宏

Posted on 2016-11-29

宏是关键字define定义，一种简单的字符串替换方式，可以分为带参数和不带参数两种。

1.#和##在宏中的使用

1	#define ERROR_LOG(x) fprintf(stderr,"Error:"#x"\n");

对参数x使用#引用，实现对参数器字符数组化预处理，如果调用ERROR_LOG(“error log”)，则输出Error:“error log“

1	#define TYPE(type,name) type name_##type##_type

使用##，预处理器实现字符串之间分隔连接方式，如果调用TYPE(int,”user”)，则预处理之后int name_int_type，这样实现了变量的类型定义

1 2	#define NUMBERS 1,2,3 int x[] = { NUMBERS };

可以使用这种方式来实现创建一个数组

2.do…while

#define PRINT(x)  \
    do{         \
       printf("print x\n"); \
    }while(0)

使用do…while…来包裹，实现语言结构的独立

3.函数宏

1	#define lang_init() c_init()

调用lang_init()，实现c_init()函数的调用

4.宏参数

1	#define min(X, Y) ((X) < (Y) ? (X) : (Y))

调用x = min(a, b);实际上是x = ((a) < (b) ? (a) : (b)); 执行

5.预定义宏
COUNTER，GFORTRAN，GNUC，GNUC_MINOR，GNUC_PATCHLEVEL，GNUG，STRICT_ANSI，VERSION，NO_INLINE，GNUC_GNU_INLINE，OPTIMIZE
等等

6.#ifdef …. #else …. #if … #endif

7.PHP中常见的宏

# define CG(v) TSRMG(compiler_globals_id, zend_compiler_globals *, v)
# define EG(v) TSRMG(executor_globals_id, zend_executor_globals *, v)
# define PG(v) TSRMG(core_globals_id, php_core_globals *, v)
# define SG(v) TSRMG(sapi_globals_id, sapi_globals_struct *, v)
# define PG(v) TSRMG(core_globals_id, php_core_globals *, v)
# define TSRM_UNSHUFFLE_RSRC_ID(rsrc_id) ((rsrc_id)-1)
# define TSRMG(id, type, element) (((type) (*((void ***) tsrm_ls))[TSRM_UNSHUFFLE_RSRC_ID(id)])->element)

C指针

Posted on 2016-11-25

说起指针，不得不提下数组，因为两者在某些方面有相似之处，请看下面的代码：

1 2	extern int *x; extern int y[];

数组在什么时候等同于指针呢？什么时候又不等同于指针呢？

1.数组和指针的访问方式

1 2	char a[9] = "abcdefgh"; char *p = "abcdefgh";

编译器编译的时候会生成一个符号表，这样a和p都在这个符号表里，每一个符号表，都有一个符号跟地址的对应关系，如下：
a–>9980
p–>4624

a–>9980–>+1–>+2–+3–>…->+i–>–>
编译器符号表地址9980
取i的值，将它与9980相加
取地址(9980+i)的内容

p–>4624–>5081–>data
编译器符号表地址 4624
取地址4624的内容，即5081
取地址5081的内容
也就是说，对于a而言，符号表的地址，即为数组a地址，然而对于p而言，符号表对应的地址，对应的是一个4字节的对象指针，而其内容存放的是一个地址，这个地址存储的数据即为指针所对应的数据。

2.使用指针来访问数组

1 2	char *p="abcdefgh"; c = p[i];

p–>4624–>5081 —|(5081+i)| —5081–>+1–>+2–>+3–>…–>+i–>–>
编译器符号表的p地址4624，提取存储于此处的指针5801
取下标偏移量，将其与指针的值相加，产生一个地址
访问这个地址，取得子符
同样也能访问到数组的内容d,只是访问的方式略有不同，首先进行间接引用，然后按照下标偏移量直接访问。

3.指针所引用的数据，其地址可以是不连续的，而数组，其地址是连续的，指针使用malloc()和 free()来管理内存,且只有字符串才可以初始化指针,其它类型均不可以直接赋值,且初始化之后，是只读的，不可以修改其值 .

4.数组和指针，有且只有作为函数的参数的时候，才是等同的

1	printf("array at location %x holds string %s", a, a);

这里数组既作为指针使用，也可以作为字符数组使用，只是因为printf是函数，参数中的数组和指针等同
又比如main函数

1 2	int main(char *argv,int argc); int main(char argv[], int argc);

是等同的.

5.应用
a. 二维数组

1 2	pea[i][j] ((pea+i)+j)

等同，数组中单个元素的存储和引用实际上是以线性形式排列的内存重的，即以第一维作为组，第二位作为每个元素在组内的偏移量，这样线性排列.
b.函数指针

int (*paf())[20]{
    int (*pear)[20];
    pear = calloc(20, sizeof(int));
    if(!pear) longjmp(error,1);
    return pear;
}
int (result)[20];
result = paf();

c.Iliffe向量，即将一个二维数组，创建一个一维数组，数组中的元素是指向其它东西的指针

1	my_function(char **my_array);

my_array是一个二维数组，一个指向向量的指针数组