<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<atom:link href="http://gentoo-zh.org/extern.php?action=feed&amp;tid=464&amp;type=rss" rel="self" type="application/rss+xml" />
		<title><![CDATA[Gentoo中文社区 / linux源码解读（二十一）：网络通信简介——tcp三次握手]]></title>
		<link>http://www.gentoo-zh.org/viewtopic.php?id=464</link>
		<description><![CDATA[linux源码解读（二十一）：网络通信简介——tcp三次握手 最近发表的帖子。]]></description>
		<lastBuildDate>Sun, 08 Jan 2023 02:33:07 +0000</lastBuildDate>
		<generator>FluxBB</generator>
		<item>
			<title><![CDATA[Re: linux源码解读（二十一）：网络通信简介——tcp三次握手]]></title>
			<link>http://www.gentoo-zh.org/viewtopic.php?pid=703#p703</link>
			<description><![CDATA[<p>弄挺好的，页面代码都进来了，有说有图有对应最好帮人理解了</p>]]></description>
			<author><![CDATA[dummy@example.com (Urit)]]></author>
			<pubDate>Sun, 08 Jan 2023 02:33:07 +0000</pubDate>
			<guid>http://www.gentoo-zh.org/viewtopic.php?pid=703#p703</guid>
		</item>
		<item>
			<title><![CDATA[linux源码解读（二十一）：网络通信简介——tcp三次握手]]></title>
			<link>http://www.gentoo-zh.org/viewtopic.php?pid=471#p471</link>
			<description><![CDATA[<p>前面介绍了用来管理存放网络数据包的sk_buff，以及描述通信协议的socket和sock结构体，现在终于轮到怎么和远程的计算机通信了！从常识上讲，通信之前必须要建立连接，比如有线的键盘给电脑发送信号，需要先让键盘通过usb接口连接到电脑，否则电脑怎么接受键盘的电信号了？同理：我要想使用鼠标，比如先把鼠标插入电脑的usb接口，移动鼠标后鼠标才会给电脑发送电信号，这两个都需要先建立物理连接！那么两台相距十万八千里的主机互相通信，这个连接该怎么建立了？物理上的连接当然是通过交换机、路由器以及电缆、光纤这些设备完成的，逻辑上的连接又是怎么建立的了？本文以tcp协议为例说明！</p><p>&#160; &#160;1、tcp协议在业界使用了这么多年，已经非常成熟，三次握手的原理我就不再赘述！3次握手的流程如下图所示：<br /><span class="postimg"><img src="https://img2022.cnblogs.com/blog/2052730/202202/2052730-20220203210727289-155435471.png" alt="FluxBB bbcode 测试" /></span></p><p>&#160; &#160; &#160; &#160; 从上图可以看出，client调用connect、server调用listen函数就完成了3次握手，app的开发人员完全不需要关心这3次握手是怎么完成的！第一次时client给server发消息，表示我想和你通信，并给了一个数字M；第二次是server给client回复，表示我同意和你通信，回复的内容包括了M+1，表示这次回复是你上次发送的SYN包，同时也附上自己的N！第三次是client给server发消息，附上M+1，表示回复的是server的SYN N的包！至此双方来回一共3次通信后连接建立完毕！接下来我们挨个拆解看看每一步具体都干了啥！</p><p>&#160; &#160; &#160; （1）client给server发送SYN M数据包，用wireshark抓包可以看到SYN数据包的内容，如下：<br /><span class="postimg"><img src="https://img2022.cnblogs.com/blog/2052730/202202/2052730-20220203211156829-664666340.png" alt="FluxBB bbcode 测试" /></span></p><p>&#160; &#160; &#160; &#160; &#160; &#160;linux内核构造并发送SYN包的函数叫tcp_v4_connect，代码如下：代码不算多，重要部分加了中文注释</p><div class="codebox"><pre class="vscroll"><code>/* This will initiate an outgoing connection. */
int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
{
    struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
    struct inet_sock *inet = inet_sk(sk);
    struct tcp_sock *tp = tcp_sk(sk);
    __be16 orig_sport, orig_dport;
    __be32 daddr, nexthop;
    struct flowi4 *fl4;
    struct rtable *rt;
    int err;
    struct ip_options_rcu *inet_opt;

    if (addr_len &lt; sizeof(struct sockaddr_in))
        return -EINVAL;
    /*AF_INET是用户调用socket函数创建套接字时传入的参数，
      这里校验地址长度和协议簇*/
    if (usin-&gt;sin_family != AF_INET)
        return -EAFNOSUPPORT;
    /*将下一跳地址和目的地址的临时变量都暂时设为用户提交的地址*/
    nexthop = daddr = usin-&gt;sin_addr.s_addr;
    inet_opt = rcu_dereference_protected(inet-&gt;inet_opt,
                         lockdep_sock_is_held(sk));
    /*  如果使用了来源地址路由，选择一个合适的下一跳地址。*/
    if (inet_opt &amp;&amp; inet_opt-&gt;opt.srr) {
        if (!daddr)
            return -EINVAL;
        nexthop = inet_opt-&gt;opt.faddr;
    }
    /*获取数据包的ip层路由信息*/
    orig_sport = inet-&gt;inet_sport;
    orig_dport = usin-&gt;sin_port;
    fl4 = &amp;inet-&gt;cork.fl.u.ip4;
    /*一台主机可能有多个ip地址，用哪个ip地址发送数据包了？
      根据nexthop参数(也就是connect传递下来的服务器ip)查路由表，
      命中的路由表项中包含有本地ip地址*/
    rt = ip_route_connect(fl4, nexthop, inet-&gt;inet_saddr,
                  RT_CONN_FLAGS(sk), sk-&gt;sk_bound_dev_if,
                  IPPROTO_TCP,
                  orig_sport, orig_dport, sk);
    if (IS_ERR(rt)) {
        err = PTR_ERR(rt);
        if (err == -ENETUNREACH)
            IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
        return err;
    }
    /* 进行路由查找，并校验返回的路由的类型，
        TCP是不被允许使用多播和广播的。*/
    if (rt-&gt;rt_flags &amp; (RTCF_MULTICAST | RTCF_BROADCAST)) {
        ip_rt_put(rt);
        return -ENETUNREACH;
    }
    
    if (!inet_opt || !inet_opt-&gt;opt.srr)
        daddr = fl4-&gt;daddr;
    /*如果用户之前没有bind源，inet-&gt;inet_saddr将会是0.
      此处判断如果saddr是0，就把查路由返回的fl4-&gt;saddr赋值给inet-&gt;inet_saddr。
      inet-&gt;inet_saddr的值将来会作为syn报文的源ip；
      
      1、如果在这里更改源ip，用随机数填充后发送给服务器，
      服务器会不会不停地发送ack数据包导致达到DDOS的效果了?
      2、如果把这里的源ip改成受害服务器的地址，然后随机发给第三方服务器，第三方服务器收到
      SYN数据包后分分给受害服务器发送ack数据包，是不是达到了反射DDOS的效果了？
    */
    if (!inet-&gt;inet_saddr)
        inet-&gt;inet_saddr = fl4-&gt;saddr;
    sk_rcv_saddr_set(sk, inet-&gt;inet_saddr);

    if (tp-&gt;rx_opt.ts_recent_stamp &amp;&amp; inet-&gt;inet_daddr != daddr) {
        /* Reset inherited state */
        tp-&gt;rx_opt.ts_recent       = 0;
        tp-&gt;rx_opt.ts_recent_stamp = 0;
        if (likely(!tp-&gt;repair))
            tp-&gt;write_seq       = 0;
    }

    if (tcp_death_row.sysctl_tw_recycle &amp;&amp;
        !tp-&gt;rx_opt.ts_recent_stamp &amp;&amp; fl4-&gt;daddr == daddr)
        tcp_fetch_timewait_stamp(sk, &amp;rt-&gt;dst);
    
    inet-&gt;inet_dport = usin-&gt;sin_port;
    sk_daddr_set(sk, daddr);

    inet_csk(sk)-&gt;icsk_ext_hdr_len = 0;
    if (inet_opt)
        inet_csk(sk)-&gt;icsk_ext_hdr_len = inet_opt-&gt;opt.optlen;

    tp-&gt;rx_opt.mss_clamp = TCP_MSS_DEFAULT;

    /* Socket identity is still unknown (sport may be zero).
     * However we set state to SYN-SENT and not releasing socket
     * lock select source port, enter ourselves into the hash tables and
     * complete initialization after this.
       将TCP的状态设为TCP_SYN_SENT，动态绑定一个本地端口
     */
    tcp_set_state(sk, TCP_SYN_SENT);
    err = inet_hash_connect(&amp;tcp_death_row, sk);
    if (err)
        goto failure;

    sk_set_txhash(sk);
    /*最终调用的是fib_table_lookup函数从trie树中查找最长匹配的ip地址*/
    rt = ip_route_newports(fl4, rt, orig_sport, orig_dport,
                   inet-&gt;inet_sport, inet-&gt;inet_dport, sk);
    if (IS_ERR(rt)) {
        err = PTR_ERR(rt);
        rt = NULL;
        goto failure;
    }
    /* OK, now commit destination to socket.  */
    sk-&gt;sk_gso_type = SKB_GSO_TCPV4;
    sk_setup_caps(sk, &amp;rt-&gt;dst);

    if (!tp-&gt;write_seq &amp;&amp; likely(!tp-&gt;repair))
        tp-&gt;write_seq = secure_tcp_sequence_number(inet-&gt;inet_saddr,
                               inet-&gt;inet_daddr,
                               inet-&gt;inet_sport,
                               usin-&gt;sin_port);

    inet-&gt;inet_id = tp-&gt;write_seq ^ jiffies;
    /*生成SYN数据包并发送*/
    err = tcp_connect(sk);

    rt = NULL;
    if (err)
        goto failure;

    return 0;

failure:
    /*
     * This unhashes the socket and releases the local port,
     * if necessary.
     */
    tcp_set_state(sk, TCP_CLOSE);
    ip_rt_put(rt);
    sk-&gt;sk_route_caps = 0;
    inet-&gt;inet_dport = 0;
    return err;
}</code></pre></div><p>&#160; 注意：这两行代码是用来构造源ip地址的，可以通过更改这里达到反射DDOS攻击！<br />if (!inet-&gt;inet_saddr)<br />&#160; &#160; &#160; &#160; inet-&gt;inet_saddr = fl4-&gt;saddr;</p><p>&#160; 这个函数前面都是各种前置条件检查、容错等，真正构造syn数据包并发送的函数是tcp_connect函数中的tcp_send_syn_data和tcp_transmit_skb函数！由于两者是调用关系，这里重点解析transmit函数，如下：</p><div class="codebox"><pre class="vscroll"><code>/* This routine actually transmits TCP packets queued in by
 * tcp_do_sendmsg().  This is used by both the initial
 * transmission and possible later retransmissions.
 * All SKB&#039;s seen here are completely headerless.  It is our
 * job to build the TCP header, and pass the packet down to
 * IP so it can do the same plus pass the packet off to the
 * device.
 *
 * We are working here with either a clone of the original
 * SKB, or a fresh unique copy made by the retransmit engine.
 复制或者拷贝skb，构造skb中的tcp首部，并将调用网络层的发送函数发送skb；
 在发送前，首先需要克隆或者复制skb，因为在成功发送到网络设备之后，skb会释放，
 而tcp层不能真正的释放，是需要等到对该数据段的ack才可以释放；然后构造tcp首部和选项；
 最后调用网络层提供的发送回调函数发送skb，ip层的回调函数为ip_queue_xmit
 */
static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
                gfp_t gfp_mask)
{
    const struct inet_connection_sock *icsk = inet_csk(sk);
    struct inet_sock *inet;
    struct tcp_sock *tp;
    struct tcp_skb_cb *tcb;
    struct tcp_out_options opts;
    unsigned int tcp_options_size, tcp_header_size;
    struct tcp_md5sig_key *md5;
    struct tcphdr *th;
    int err;

    BUG_ON(!skb || !tcp_skb_pcount(skb));
    tp = tcp_sk(sk);
    //如果还有其他进程使用skb，就需要复制skb
    if (clone_it) {
        skb_mstamp_get(&amp;skb-&gt;skb_mstamp);
        TCP_SKB_CB(skb)-&gt;tx.in_flight = TCP_SKB_CB(skb)-&gt;end_seq
            - tp-&gt;snd_una;
        tcp_rate_skb_sent(sk, skb);

        if (unlikely(skb_cloned(skb)))
            skb = pskb_copy(skb, gfp_mask);
        else
            skb = skb_clone(skb, gfp_mask);
        if (unlikely(!skb))
            return -ENOBUFS;
    }

    inet = inet_sk(sk);
    tcb = TCP_SKB_CB(skb);
    memset(&amp;opts, 0, sizeof(opts));
    //是否是SYN请求数据包
    if (unlikely(tcb-&gt;tcp_flags &amp; TCPHDR_SYN))
        //构建TCP选项包括时间戳、窗口大小、选择回答SACK
        tcp_options_size = tcp_syn_options(sk, skb, &amp;opts, &amp;md5);
    else//构建常规TCP选项
        tcp_options_size = tcp_established_options(sk, skb, &amp;opts,
                               &amp;md5);
    //tCP头部长度包括选择长度+ TCP头部
    tcp_header_size = tcp_options_size + sizeof(struct tcphdr);

    /* if no packet is in qdisc/device queue, then allow XPS to select
     * another queue. We can be called from tcp_tsq_handler()
     * which holds one reference to sk_wmem_alloc.
     *
     * TODO: Ideally, in-flight pure ACK packets should not matter here.
     * One way to get this would be to set skb-&gt;truesize = 2 on them.
     */
    skb-&gt;ooo_okay = sk_wmem_alloc_get(sk) &lt; SKB_TRUESIZE(1);

    skb_push(skb, tcp_header_size);
    skb_reset_transport_header(skb);

    skb_orphan(skb);
    skb-&gt;sk = sk;
    skb-&gt;destructor = skb_is_tcp_pure_ack(skb) ? __sock_wfree : tcp_wfree;
    skb_set_hash_from_sk(skb, sk);
    atomic_add(skb-&gt;truesize, &amp;sk-&gt;sk_wmem_alloc);

    /* Build TCP header and checksum it. 
    前面做了大量的准备工作，这里终于开始构造tcp包头了
    */
    th = (struct tcphdr *)skb-&gt;data;
    th-&gt;source        = inet-&gt;inet_sport;
    th-&gt;dest        = inet-&gt;inet_dport;
    th-&gt;seq            = htonl(tcb-&gt;seq);
    th-&gt;ack_seq        = htonl(tp-&gt;rcv_nxt);
    *(((__be16 *)th) + 6)    = htons(((tcp_header_size &gt;&gt; 2) &lt;&lt; 12) |
                    tcb-&gt;tcp_flags);

    th-&gt;check        = 0;
    th-&gt;urg_ptr        = 0;

    /* The urg_mode check is necessary during a below snd_una win probe */
    //SYN包不需要计算窗口
    if (unlikely(tcp_urg_mode(tp) &amp;&amp; before(tcb-&gt;seq, tp-&gt;snd_up))) {
        if (before(tp-&gt;snd_up, tcb-&gt;seq + 0x10000)) {
            th-&gt;urg_ptr = htons(tp-&gt;snd_up - tcb-&gt;seq);
            th-&gt;urg = 1;
        } else if (after(tcb-&gt;seq + 0xFFFF, tp-&gt;snd_nxt)) {
            th-&gt;urg_ptr = htons(0xFFFF);
            th-&gt;urg = 1;
        }
    }

    tcp_options_write((__be32 *)(th + 1), tp, &amp;opts);
    skb_shinfo(skb)-&gt;gso_type = sk-&gt;sk_gso_type;
    if (likely(!(tcb-&gt;tcp_flags &amp; TCPHDR_SYN))) {
        th-&gt;window      = htons(tcp_select_window(sk));
        tcp_ecn_send(sk, skb, th, tcp_header_size);
    } else {
        /* RFC1323: The window in SYN &amp; SYN/ACK segments
         * is never scaled.
         */
        th-&gt;window    = htons(min(tp-&gt;rcv_wnd, 65535U));
    }
#ifdef CONFIG_TCP_MD5SIG
    /* Calculate the MD5 hash, as we have all we need now */
    if (md5) {
        sk_nocaps_add(sk, NETIF_F_GSO_MASK);
        tp-&gt;af_specific-&gt;calc_md5_hash(opts.hash_location,
                           md5, sk, skb);
    }
#endif

    icsk-&gt;icsk_af_ops-&gt;send_check(sk, skb);

    if (likely(tcb-&gt;tcp_flags &amp; TCPHDR_ACK))
        tcp_event_ack_sent(sk, tcp_skb_pcount(skb));//清楚定时器
    /* 有数据要发送 */
    if (skb-&gt;len != tcp_header_size) {
        tcp_event_data_sent(tp, sk);
        tp-&gt;data_segs_out += tcp_skb_pcount(skb);
    }
    /* 统计分段数 */
    if (after(tcb-&gt;end_seq, tp-&gt;snd_nxt) || tcb-&gt;seq == tcb-&gt;end_seq)
        TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
                  tcp_skb_pcount(skb));

    tp-&gt;segs_out += tcp_skb_pcount(skb);
    /* OK, its time to fill skb_shinfo(skb)-&gt;gso_{segs|size} */
    /* skb中分段数统计 */
    skb_shinfo(skb)-&gt;gso_segs = tcp_skb_pcount(skb);
    skb_shinfo(skb)-&gt;gso_size = tcp_skb_mss(skb);

    /* Our usage of tstamp should remain private */
    skb-&gt;tstamp.tv64 = 0;

    /* Cleanup our debris for IP stacks */
    /* 清空tcb，ip层要使用 */
    memset(skb-&gt;cb, 0, max(sizeof(struct inet_skb_parm),
                   sizeof(struct inet6_skb_parm)));
    //数据包给ip层继续添加ip地址；函数指针实际指向ip_queue_ximit，这也是实际调用的ip层函数
    err = icsk-&gt;icsk_af_ops-&gt;queue_xmit(sk, skb, &amp;inet-&gt;cork.fl);

    if (likely(err &lt;= 0))
        return err;
    /* 拥塞控制 */
    tcp_enter_cwr(sk);

    return net_xmit_eval(err);
}</code></pre></div><p>&#160; 不难发现，最终还是把skb穿给ip层的ip_queue_ximit继续构造ip数据包，代码如下：</p><div class="codebox"><pre class="vscroll"><code>/* Note: skb-&gt;sk can be different from sk, in case of tunnels */
int ip_queue_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl)
{
    struct inet_sock *inet = inet_sk(sk);
    struct net *net = sock_net(sk);
    struct ip_options_rcu *inet_opt;
    struct flowi4 *fl4;
    struct rtable *rt;
    struct iphdr *iph;
    int res;

    /* Skip all of this if the packet is already routed,
     * f.e. by something like SCTP.
     */
    rcu_read_lock();
    inet_opt = rcu_dereference(inet-&gt;inet_opt);
    fl4 = &amp;fl-&gt;u.ip4;
    /* 获取skb中的路由缓存 */
    rt = skb_rtable(skb);
    if (rt)
        goto packet_routed;

    /* Make sure we can route this packet. */
    /* 检查控制块中的路由缓存 */
    rt = (struct rtable *)__sk_dst_check(sk, 0);
    /* 缓存过期 */
    if (!rt) {
        __be32 daddr;

        /* Use correct destination address if we have options. */
        //终于看到了目的ip地址
        daddr = inet-&gt;inet_daddr;
        if (inet_opt &amp;&amp; inet_opt-&gt;opt.srr)
            daddr = inet_opt-&gt;opt.faddr;

        /* If this fails, retransmit mechanism of transport layer will
         * keep trying until route appears or the connection times
         * itself out. 重新查找路由缓存
         */
        rt = ip_route_output_ports(net, fl4, sk,
                       daddr, inet-&gt;inet_saddr,
                       inet-&gt;inet_dport,
                       inet-&gt;inet_sport,
                       sk-&gt;sk_protocol,
                       RT_CONN_FLAGS(sk),
                       sk-&gt;sk_bound_dev_if);
        if (IS_ERR(rt))
            goto no_route;
        /* 设置控制块的路由缓存 */
        sk_setup_caps(sk, &amp;rt-&gt;dst);
    }
     /* 将路由设置到skb中 */
    skb_dst_set_noref(skb, &amp;rt-&gt;dst);

packet_routed:
    if (inet_opt &amp;&amp; inet_opt-&gt;opt.is_strictroute &amp;&amp; rt-&gt;rt_uses_gateway)
        goto no_route;

    /* OK, we know where to send it, allocate and build IP header. */
    /*找到目标后开始在tcp包的基础上构造ip包*/
    /*在skb中加上ip的头*/
    skb_push(skb, sizeof(struct iphdr) + (inet_opt ? inet_opt-&gt;opt.optlen : 0));
    skb_reset_network_header(skb);
    iph = ip_hdr(skb);
    *((__be16 *)iph) = htons((4 &lt;&lt; 12) | (5 &lt;&lt; 8) | (inet-&gt;tos &amp; 0xff));
    if (ip_dont_fragment(sk, &amp;rt-&gt;dst) &amp;&amp; !skb-&gt;ignore_df)
        iph-&gt;frag_off = htons(IP_DF);
    else
        iph-&gt;frag_off = 0;
    iph-&gt;ttl      = ip_select_ttl(inet, &amp;rt-&gt;dst);
    iph-&gt;protocol = sk-&gt;sk_protocol;
    ip_copy_addrs(iph, fl4);

    /* Transport layer set skb-&gt;h.foo itself. */
    /* 构造ip选项 */
    if (inet_opt &amp;&amp; inet_opt-&gt;opt.optlen) {
        iph-&gt;ihl += inet_opt-&gt;opt.optlen &gt;&gt; 2;
        ip_options_build(skb, &amp;inet_opt-&gt;opt, inet-&gt;inet_daddr, rt, 0);
    }
    /* 设置id */
    ip_select_ident_segs(net, skb, sk,
                 skb_shinfo(skb)-&gt;gso_segs ?: 1);

    /* TODO : should we use skb-&gt;sk here instead of sk ? */
    skb-&gt;priority = sk-&gt;sk_priority;
    skb-&gt;mark = sk-&gt;sk_mark;
    //发送ip包
    res = ip_local_out(net, sk, skb);
    rcu_read_unlock();
    return res;

no_route://无路由处理
    rcu_read_unlock();
    IP_INC_STATS(net, IPSTATS_MIB_OUTNOROUTES);
    kfree_skb(skb);
    return -EHOSTUNREACH;
}</code></pre></div><p>&#160; ip层调用的是ip_local_out，继续往下面最终调用的是这个函数通过网卡把数据发到network：</p><div class="codebox"><pre><code>/* Output packet to network from transport.  */
static inline int dst_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
    return skb_dst(skb)-&gt;output(net, sk, skb);
}</code></pre></div><p>&#160; &#160; &#160; 纵观整个过程：核心都在一层一层地构通过添加包头造数据包！</p><p>&#160; &#160; &#160;（2）server的ack数据包：<br /><span class="postimg"><img src="https://img2022.cnblogs.com/blog/2052730/202202/2052730-20220203211800209-1069713841.png" alt="FluxBB bbcode 测试" /></span> </p><p>&#160; &#160; server在收到SYN数据包后，需要回复ACK数据包，数据包地内容如上图所示；server构造包的过程和client没任何区别，本质上都是一层一层地添加包头（增加skb字符串的数据和长度）！核心函数如下：</p><div class="codebox"><pre class="vscroll"><code>/*
 *    Send a SYN-ACK after having received a SYN.
 *    This still operates on a request_sock only, not on a big
 *    socket.
 */
static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
                  struct flowi *fl,
                  struct request_sock *req,
                  struct tcp_fastopen_cookie *foc,
                  enum tcp_synack_type synack_type)
{
    const struct inet_request_sock *ireq = inet_rsk(req);
    struct flowi4 fl4;
    int err = -1;
    struct sk_buff *skb;

    /* First, grab a route. */
    if (!dst &amp;&amp; (dst = inet_csk_route_req(sk, &amp;fl4, req)) == NULL)
        return -1;
    /*构造syn-sck的包，并返回skb*/
    skb = tcp_make_synack(sk, dst, req, foc, synack_type);

    if (skb) {
        __tcp_v4_send_check(skb, ireq-&gt;ir_loc_addr, ireq-&gt;ir_rmt_addr);
        /*添加ip包头并发送*/
        err = ip_build_and_send_pkt(skb, sk, ireq-&gt;ir_loc_addr,
                        ireq-&gt;ir_rmt_addr,
                        ireq-&gt;opt);
        err = net_xmit_eval(err);
    }

    return err;
}</code></pre></div><p>&#160; 其中构造和发送数据包的函数分别是tcp_make_synack和ip_build_and_send_pkt，第一个函数要分配skb并填充tcp头部，这里就是DDOS攻击点之一：</p><div class="codebox"><pre class="vscroll"><code>/**
 * tcp_make_synack - Prepare a SYN-ACK.
 * sk: listener socket
 * dst: dst entry attached to the SYNACK
 * req: request_sock pointer
 *
 * Allocate one skb and build a SYNACK packet.
 * @dst is consumed : Caller should not use it again.
   生成SYN-ACK数据包
 */
struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
                struct request_sock *req,
                struct tcp_fastopen_cookie *foc,
                enum tcp_synack_type synack_type)
{
    struct inet_request_sock *ireq = inet_rsk(req);
    const struct tcp_sock *tp = tcp_sk(sk);
    struct tcp_md5sig_key *md5 = NULL;
    struct tcp_out_options opts;
    struct sk_buff *skb;
    int tcp_header_size;
    struct tcphdr *th;
    u16 user_mss;
    int mss;
    /*分配skb，这是DDOS攻击生效的原因之一：
    如果一个client不停的更改源ip给server发送syn包，server还以为有很多
    client想和自己通信，然后不停地分配skb为
    接下来的通信做准备，可能导致内存耗尽
    */
    skb = alloc_skb(MAX_TCP_HEADER, GFP_ATOMIC);
    if (unlikely(!skb)) {
        dst_release(dst);
        return NULL;
    }
    /* Reserve space for headers. */
    skb_reserve(skb, MAX_TCP_HEADER);

    switch (synack_type) {
    case TCP_SYNACK_NORMAL:
        skb_set_owner_w(skb, req_to_sk(req));
        break;
    case TCP_SYNACK_COOKIE:
        /* Under synflood, we do not attach skb to a socket,
         * to avoid false sharing.
         */
        break;
    case TCP_SYNACK_FASTOPEN:
        /* sk is a const pointer, because we want to express multiple
         * cpu might call us concurrently.
         * sk-&gt;sk_wmem_alloc in an atomic, we can promote to rw.
         */
        skb_set_owner_w(skb, (struct sock *)sk);
        break;
    }
    skb_dst_set(skb, dst);

    mss = dst_metric_advmss(dst);
    user_mss = READ_ONCE(tp-&gt;rx_opt.user_mss);
    if (user_mss &amp;&amp; user_mss &lt; mss)
        mss = user_mss;

    memset(&amp;opts, 0, sizeof(opts));
#ifdef CONFIG_SYN_COOKIES
    if (unlikely(req-&gt;cookie_ts))
        skb-&gt;skb_mstamp.stamp_jiffies = cookie_init_timestamp(req);
    else
#endif
    skb_mstamp_get(&amp;skb-&gt;skb_mstamp);

#ifdef CONFIG_TCP_MD5SIG
    rcu_read_lock();
    md5 = tcp_rsk(req)-&gt;af_specific-&gt;req_md5_lookup(sk, req_to_sk(req));
#endif
    skb_set_hash(skb, tcp_rsk(req)-&gt;txhash, PKT_HASH_TYPE_L4);
    tcp_header_size = tcp_synack_options(req, mss, skb, &amp;opts, md5, foc) +
              sizeof(*th);
    //开始填充tcp头部了
    skb_push(skb, tcp_header_size);
    skb_reset_transport_header(skb);

    th = (struct tcphdr *)skb-&gt;data;
    memset(th, 0, sizeof(struct tcphdr));
    th-&gt;syn = 1;
    th-&gt;ack = 1;
    tcp_ecn_make_synack(req, th);
    th-&gt;source = htons(ireq-&gt;ir_num);
    th-&gt;dest = ireq-&gt;ir_rmt_port;
    /* Setting of flags are superfluous here for callers (and ECE is
     * not even correctly set)
     */
    tcp_init_nondata_skb(skb, tcp_rsk(req)-&gt;snt_isn,
                 TCPHDR_SYN | TCPHDR_ACK);

    th-&gt;seq = htonl(TCP_SKB_CB(skb)-&gt;seq);
    /* XXX data is queued and acked as is. No buffer/window check */
    th-&gt;ack_seq = htonl(tcp_rsk(req)-&gt;rcv_nxt);

    /* RFC1323: The window in SYN &amp; SYN/ACK segments is never scaled. */
    th-&gt;window = htons(min(req-&gt;rsk_rcv_wnd, 65535U));
    tcp_options_write((__be32 *)(th + 1), NULL, &amp;opts);
    th-&gt;doff = (tcp_header_size &gt;&gt; 2);
    __TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);

#ifdef CONFIG_TCP_MD5SIG
    /* Okay, we have all we need - do the md5 hash if needed */
    if (md5)
        tcp_rsk(req)-&gt;af_specific-&gt;calc_md5_hash(opts.hash_location,
                           md5, req_to_sk(req), skb);
    rcu_read_unlock();
#endif

    /* Do not fool tcpdump (if any), clean our debris */
    skb-&gt;tstamp.tv64 = 0;
    return skb;
}
EXPORT_SYMBOL(tcp_make_synack);</code></pre></div><p>&#160; 最后在skb添加ip头，调用ip_local_out把数据包发送出去，代码逻辑很简单：</p><div class="codebox"><pre class="vscroll"><code>/*
 *        Add an ip header to a skbuff and send it out.
 *
 */
int ip_build_and_send_pkt(struct sk_buff *skb, const struct sock *sk,
              __be32 saddr, __be32 daddr, struct ip_options_rcu *opt)
{
    struct inet_sock *inet = inet_sk(sk);
    struct rtable *rt = skb_rtable(skb);
    struct net *net = sock_net(sk);
    struct iphdr *iph;

    /* Build the IP header. 构造ip头*/
    skb_push(skb, sizeof(struct iphdr) + (opt ? opt-&gt;opt.optlen : 0));
    skb_reset_network_header(skb);
    iph = ip_hdr(skb);
    iph-&gt;version  = 4;
    iph-&gt;ihl      = 5;
    iph-&gt;tos      = inet-&gt;tos;
    iph-&gt;ttl      = ip_select_ttl(inet, &amp;rt-&gt;dst);
    iph-&gt;daddr    = (opt &amp;&amp; opt-&gt;opt.srr ? opt-&gt;opt.faddr : daddr);
    iph-&gt;saddr    = saddr;
    iph-&gt;protocol = sk-&gt;sk_protocol;
    if (ip_dont_fragment(sk, &amp;rt-&gt;dst)) {
        iph-&gt;frag_off = htons(IP_DF);
        iph-&gt;id = 0;
    } else {
        iph-&gt;frag_off = 0;
        __ip_select_ident(net, iph, 1);
    }

    if (opt &amp;&amp; opt-&gt;opt.optlen) {
        iph-&gt;ihl += opt-&gt;opt.optlen&gt;&gt;2;
        ip_options_build(skb, &amp;opt-&gt;opt, daddr, rt, 0);
    }

    skb-&gt;priority = sk-&gt;sk_priority;
    skb-&gt;mark = sk-&gt;sk_mark;

    /* Send it out. */
    return ip_local_out(net, skb-&gt;sk, skb);
}</code></pre></div><p>&#160; &#160; server发送SYN-ACK就完毕了！</p><p>&#160; （3）client的ack数据包<br /><span class="postimg"><img src="https://img2022.cnblogs.com/blog/2052730/202202/2052730-20220203211832573-76967108.png" alt="FluxBB bbcode 测试" /></span> </p><p>&#160; &#160; &#160; 这个ack是client给server发送的，本质还是个字符串，构造出这个字符串发送出去就行了，所以最核心的还是调用tcp_transmit_skb函数，整个函数代码如下：</p><div class="codebox"><pre class="vscroll"><code>/* This routine sends an ack and also updates the window. */
void tcp_send_ack(struct sock *sk)
{
    struct sk_buff *buff;

    /* If we have been reset, we may not send again. */
    /* 如果当前的套接字已经被关闭了，那么直接返回。 */
    if (sk-&gt;sk_state == TCP_CLOSE)
        return;
    /* 拥塞避免 */
    tcp_ca_event(sk, CA_EVENT_NON_DELAYED_ACK);

    /* We are not putting this on the write queue, so
     * tcp_transmit_skb() will set the ownership to this
     * sock.
     */
    buff = alloc_skb(MAX_TCP_HEADER,
             sk_gfp_mask(sk, GFP_ATOMIC | __GFP_NOWARN));
    if (unlikely(!buff)) {
        inet_csk_schedule_ack(sk);
        inet_csk(sk)-&gt;icsk_ack.ato = TCP_ATO_MIN;
        inet_csk_reset_xmit_timer(sk, ICSK_TIME_DACK,
                      TCP_DELACK_MAX, TCP_RTO_MAX);
        return;
    }

    /* Reserve space for headers and prepare control bits. */
    /* 初始化 ACK 包 */
    skb_reserve(buff, MAX_TCP_HEADER);
    tcp_init_nondata_skb(buff, tcp_acceptable_seq(sk), TCPHDR_ACK);

    /* We do not want pure acks influencing TCP Small Queues or fq/pacing
     * too much.
     * SKB_TRUESIZE(max(1 .. 66, MAX_TCP_HEADER)) is unfortunately ~784
     * We also avoid tcp_wfree() overhead (cache line miss accessing
     * tp-&gt;tsq_flags) by using regular sock_wfree()
     */
    skb_set_tcp_pure_ack(buff);

    /* Send it off, this clears delayed acks for us. */
    /* 添加时间戳并发送 ACK 包 */
    skb_mstamp_get(&amp;buff-&gt;skb_mstamp);
    //还是从这里构造和发送ack包，老演员了！
    tcp_transmit_skb(sk, buff, 0, (__force gfp_t)0);
}
EXPORT_SYMBOL_GPL(tcp_send_ack);</code></pre></div><p>总结：</p><p>1、网卡只负责简单粗暴地收发数据（说白了就是字符串），协议什么的需要操作系统考虑，网卡这种硬件是不care的！</p><p>2、socket、sock、sk_buff、tcphdr等结构体存在的最终目的都是为了构造协议不同层级的数据包（说白了就是不同的字符串，为了方便理解和维护、避免眉毛胡子一把抓的毛病，把字符串的不同位置抽象成了不同的属性或标识）；所以不同操作系统肯定有不同的结构体和方法来生成和解析数据包，只要保证发出去的字符串符合协议规定的格式就行了！</p><p>3、逻辑层面所谓的建立连接：双方通过SYN和ACK确定要互相通信后，会分配skb来存储收发的数据！DDOS攻击的一种就是想办法让server不停地分配skb来接受即将到来的数据！然而server的内存是有限的，分配了大量的skb最终会导致内存被耗尽！ </p><p> </p><p>参考：</p><p>1、https://network.51cto.com/article/648928.html?mobile&#160; tcp三次握手之connect</p><p>2、https://www.leviathan.vip/2018/08/09/Linux%E5%86%85%E6%A0%B8%E6%BA%90%E7%A0%81%E5%88%86%E6%9E%90-TCP%E5%8D%8F%E8%AE%AE-1/&#160; tcp协议分析</p><p>3、http://45.76.5.96/opensource/tcp/tcp.pdf&#160; &#160; linux tcp源码分析</p>]]></description>
			<author><![CDATA[dummy@example.com (batsom)]]></author>
			<pubDate>Tue, 11 Oct 2022 03:50:24 +0000</pubDate>
			<guid>http://www.gentoo-zh.org/viewtopic.php?pid=471#p471</guid>
		</item>
	</channel>
</rss>
