<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://matrix-spec.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://matrix-spec.github.io/" rel="alternate" type="text/html" hreflang="en" /><updated>2025-12-03T07:56:51+00:00</updated><id>https://matrix-spec.github.io/feed.xml</id><title type="html">Evgenii Zhuravlev - DevOps Engineer</title><subtitle>Personal blog about DevOps, Kubernetes, Linux, Clouds infrastructures, CI/CD, Terraform, Ansible, and other.</subtitle><author><name>Evgenii Zhuravlev</name></author><entry><title type="html">Tagging all ec2 instances for all EKSs in account.</title><link href="https://matrix-spec.github.io/aws/2025/03/13/tagging-ec2-instances-for-EKS.html" rel="alternate" type="text/html" title="Tagging all ec2 instances for all EKSs in account." /><published>2025-03-13T00:00:00+00:00</published><updated>2025-03-13T00:00:00+00:00</updated><id>https://matrix-spec.github.io/aws/2025/03/13/tagging-ec2-instances-for-EKS</id><content type="html" xml:base="https://matrix-spec.github.io/aws/2025/03/13/tagging-ec2-instances-for-EKS.html"><![CDATA[<p><img src="/assets/images/posts/tagging-all-ec2-for-EKS/aws-logo-glitch.webp" alt="banner" /></p>

<p>I recently came across an interesting task. For cost management, it was necessary to tagging all ec2 instances on the AWS account. The tag should contain <code class="language-plaintext highlighter-rouge">Name = EKS-$CLUSTER-NAME</code>.
As you know, ec2 clusters created for EKS do not have the <code class="language-plaintext highlighter-rouge">Name</code> tag by default, they are created within the Node Group from a custom Launch Template (if you explicitly specified and created it). Or they are created with an AWS managed Launch Template that controls ec2 in your Node Group, unless you explicitly specified otherwise.
With the first case, when you have a custom Launch Template, everything is clear, you can simply add custom tags to it, and get ec2 instances with <code class="language-plaintext highlighter-rouge">Name = EKS-$CLUSTER-NAME</code> at the output. But what if some of the Node Group EKS are not managed through a separate Launch Template?
By default, AWS does not have a property that allows you to create an tag for an EKS node and link it to ec2. As a result, when listing ec2 in your account, you can see a huge number of instances without a name, which will only have service tags that EKS uses to manage ec2. One of them is: <code class="language-plaintext highlighter-rouge">kubernetes.io/cluster/$CLUSTER-NAME = owned</code>, let’s try to use it.
I used this tag to assign a custom tag <code class="language-plaintext highlighter-rouge">Name = EKS-$CLUSTER-NAME</code> for new ec2 instances at the time of their creation, using AWS Lambda and events from EventBridge for this purpose.</p>

<h2 id="step-1-prepare-the-iam-policy-and-the-iam-role">Step 1. Prepare the IAM policy and the IAM role</h2>

<p>Create a lambda_ec2_policy policy file.json:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">{</span>
  <span class="s2">"Version"</span>: <span class="s2">"2012-10-17"</span>,
  <span class="s2">"Statement"</span>: <span class="o">[</span>
    <span class="o">{</span>
      <span class="s2">"Effect"</span>: <span class="s2">"Allow"</span>,
      <span class="s2">"Action"</span>: <span class="o">[</span>
        <span class="s2">"ec2:DescribeInstances"</span>,
        <span class="s2">"ec2:CreateTags"</span>
      <span class="o">]</span>,
      <span class="s2">"Resource"</span>: <span class="s2">"*"</span>
    <span class="o">}</span>
  <span class="o">]</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Create a policy in AWS:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws iam create-policy <span class="se">\</span>
  <span class="nt">--policy-name</span> LambdaEC2TaggingPolicy <span class="se">\</span>
  <span class="nt">--policy-document</span> file://lambda_ec2_policy.json
</code></pre></div></div>

<p>Create an IAM role for the Lambda function:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws iam create-role <span class="se">\</span>
  <span class="nt">--role-name</span> LambdaEC2TaggingRole <span class="se">\</span>
  <span class="nt">--assume-role-policy-document</span> <span class="s1">'{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": { "Service": "lambda.amazonaws.com" },
      "Action": "sts:AssumeRole"
    }]
  }'</span>
</code></pre></div></div>

<p>Link the IAM policy to the role:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">ACCOUNT_ID</span><span class="o">=</span><span class="si">$(</span>aws sts get-caller-identity <span class="nt">--query</span> <span class="s2">"Account"</span> <span class="nt">--output</span> text<span class="si">)</span>
aws iam attach-role-policy <span class="se">\</span>
  <span class="nt">--role-name</span> LambdaEC2TaggingRole <span class="se">\</span>
  <span class="nt">--policy-arn</span> arn:aws:iam::<span class="k">${</span><span class="nv">ACCOUNT_ID</span><span class="k">}</span>:policy/LambdaEC2TaggingPolicy
</code></pre></div></div>

<p>You also need to add a standard policy for logging logs to CloudWatch.:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws iam attach-role-policy <span class="se">\</span>
  <span class="nt">--role-name</span> LambdaEC2TaggingRole <span class="se">\</span>
  <span class="nt">--policy-arn</span> arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
</code></pre></div></div>

<h2 id="step-2-creating-a-lambda-function">Step 2. Creating a Lambda Function</h2>

<p>Save your code to a file lambda_function.py:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">boto3</span>
<span class="kn">import</span> <span class="n">json</span>
<span class="kn">import</span> <span class="n">logging</span>

<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">()</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">setLevel</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">INFO</span><span class="p">)</span>

<span class="k">def</span> <span class="nf">lambda_handler</span><span class="p">(</span><span class="n">event</span><span class="p">,</span> <span class="n">context</span><span class="p">):</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Event received: </span><span class="si">{</span><span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">(</span><span class="n">event</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="n">ec2_client</span> <span class="o">=</span> <span class="n">boto3</span><span class="p">.</span><span class="nf">client</span><span class="p">(</span><span class="sh">'</span><span class="s">ec2</span><span class="sh">'</span><span class="p">)</span>
    
    <span class="n">instance_ids</span> <span class="o">=</span> <span class="nf">extract_instance_ids</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
    
    <span class="k">if</span> <span class="ow">not</span> <span class="n">instance_ids</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">warning</span><span class="p">(</span><span class="sh">"</span><span class="s">Instance IDs were not found in the event.</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="p">{</span><span class="sh">'</span><span class="s">status</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">no instance ids found</span><span class="sh">'</span><span class="p">}</span>
    
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Instance IDs are extracted: </span><span class="si">{</span><span class="n">instance_ids</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="n">tagged_instances</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">instance_id</span> <span class="ow">in</span> <span class="n">instance_ids</span><span class="p">:</span>
        <span class="k">if</span> <span class="nf">tag_instance</span><span class="p">(</span><span class="n">ec2_client</span><span class="p">,</span> <span class="n">instance_id</span><span class="p">):</span>
            <span class="n">tagged_instances</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">instance_id</span><span class="p">)</span>
    
    <span class="k">return</span> <span class="p">{</span>
        <span class="sh">'</span><span class="s">status</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">processed</span><span class="sh">'</span><span class="p">,</span>
        <span class="sh">'</span><span class="s">instances_total</span><span class="sh">'</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">instance_ids</span><span class="p">),</span>
        <span class="sh">'</span><span class="s">instances_tagged</span><span class="sh">'</span><span class="p">:</span> <span class="nf">len</span><span class="p">(</span><span class="n">tagged_instances</span><span class="p">),</span>
        <span class="sh">'</span><span class="s">instances</span><span class="sh">'</span><span class="p">:</span> <span class="n">tagged_instances</span>
    <span class="p">}</span>

<span class="k">def</span> <span class="nf">extract_instance_ids</span><span class="p">(</span><span class="n">event</span><span class="p">):</span>
    <span class="n">instance_ids</span> <span class="o">=</span> <span class="p">[]</span>
    
    <span class="k">if</span> <span class="sh">'</span><span class="s">detail-type</span><span class="sh">'</span> <span class="ow">in</span> <span class="n">event</span><span class="p">:</span>
        <span class="n">detail_type</span> <span class="o">=</span> <span class="n">event</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">detail-type</span><span class="sh">'</span><span class="p">)</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Type of event: </span><span class="si">{</span><span class="n">detail_type</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        
        <span class="k">if</span> <span class="n">detail_type</span> <span class="o">==</span> <span class="sh">"</span><span class="s">AWS API Call via CloudTrail</span><span class="sh">"</span><span class="p">:</span>
            <span class="k">try</span><span class="p">:</span>
                <span class="n">items</span> <span class="o">=</span> <span class="n">event</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">detail</span><span class="sh">'</span><span class="p">,</span> <span class="p">{}).</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">responseElements</span><span class="sh">'</span><span class="p">,</span> <span class="p">{}).</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">instancesSet</span><span class="sh">'</span><span class="p">,</span> <span class="p">{}).</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">items</span><span class="sh">'</span><span class="p">,</span> <span class="p">[])</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">instancesSet elements found: </span><span class="si">{</span><span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">(</span><span class="n">items</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                
                <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">items</span><span class="p">:</span>
                    <span class="n">instance_id</span> <span class="o">=</span> <span class="n">item</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">instanceId</span><span class="sh">'</span><span class="p">)</span>
                    <span class="k">if</span> <span class="n">instance_id</span><span class="p">:</span>
                        <span class="n">instance_ids</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">instance_id</span><span class="p">)</span>
            <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Error extracting instance IDs from CloudTrail: </span><span class="si">{</span><span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        
        <span class="k">elif</span> <span class="n">detail_type</span> <span class="o">==</span> <span class="sh">"</span><span class="s">EC2 Instance State-change Notification</span><span class="sh">"</span><span class="p">:</span>
            <span class="n">instance_id</span> <span class="o">=</span> <span class="n">event</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">detail</span><span class="sh">'</span><span class="p">,</span> <span class="p">{}).</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">instance-id</span><span class="sh">'</span><span class="p">)</span>
            <span class="k">if</span> <span class="n">instance_id</span><span class="p">:</span>
                <span class="n">instance_ids</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">instance_id</span><span class="p">)</span>
    
    <span class="k">elif</span> <span class="sh">'</span><span class="s">resources</span><span class="sh">'</span> <span class="ow">in</span> <span class="n">event</span><span class="p">:</span>
        <span class="n">resources</span> <span class="o">=</span> <span class="n">event</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">resources</span><span class="sh">'</span><span class="p">,</span> <span class="p">[])</span>
        <span class="k">for</span> <span class="n">resource</span> <span class="ow">in</span> <span class="n">resources</span><span class="p">:</span>
            <span class="k">if</span> <span class="n">resource</span><span class="p">.</span><span class="nf">startswith</span><span class="p">(</span><span class="sh">'</span><span class="s">arn:aws:ec2:</span><span class="sh">'</span><span class="p">)</span> <span class="ow">and</span> <span class="sh">'</span><span class="s">/instance/</span><span class="sh">'</span> <span class="ow">in</span> <span class="n">resource</span><span class="p">:</span>
                <span class="n">instance_id</span> <span class="o">=</span> <span class="n">resource</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sh">'</span><span class="s">/instance/</span><span class="sh">'</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span>
                <span class="n">instance_ids</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">instance_id</span><span class="p">)</span>
    
    <span class="k">if</span> <span class="ow">not</span> <span class="n">instance_ids</span> <span class="ow">and</span> <span class="sh">'</span><span class="s">instance_id</span><span class="sh">'</span> <span class="ow">in</span> <span class="n">event</span><span class="p">:</span>
        <span class="n">instance_ids</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">event</span><span class="p">[</span><span class="sh">'</span><span class="s">instance_id</span><span class="sh">'</span><span class="p">])</span>
    
    <span class="k">if</span> <span class="ow">not</span> <span class="n">instance_ids</span> <span class="ow">and</span> <span class="sh">'</span><span class="s">instanceId</span><span class="sh">'</span> <span class="ow">in</span> <span class="n">event</span><span class="p">:</span>
        <span class="n">instance_ids</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">event</span><span class="p">[</span><span class="sh">'</span><span class="s">instanceId</span><span class="sh">'</span><span class="p">])</span>
    
    <span class="k">return</span> <span class="n">instance_ids</span>

<span class="k">def</span> <span class="nf">tag_instance</span><span class="p">(</span><span class="n">ec2_client</span><span class="p">,</span> <span class="n">instance_id</span><span class="p">):</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Instance Processing </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    
    <span class="k">try</span><span class="p">:</span>
        <span class="kn">import</span> <span class="n">time</span>
        <span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="mi">30</span><span class="p">)</span>
        
        <span class="n">retries</span> <span class="o">=</span> <span class="mi">3</span>
        <span class="k">for</span> <span class="n">attempt</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">retries</span><span class="p">):</span>
            <span class="k">try</span><span class="p">:</span>
                <span class="n">response</span> <span class="o">=</span> <span class="n">ec2_client</span><span class="p">.</span><span class="nf">describe_instances</span><span class="p">(</span><span class="n">InstanceIds</span><span class="o">=</span><span class="p">[</span><span class="n">instance_id</span><span class="p">])</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Received information about the instance on the attempt </span><span class="si">{</span><span class="n">attempt</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                <span class="k">break</span>
            <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
                <span class="k">if</span> <span class="n">attempt</span> <span class="o">&lt;</span> <span class="n">retries</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
                    <span class="n">logger</span><span class="p">.</span><span class="nf">warning</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Couldn</span><span class="sh">'</span><span class="s">t get instance data </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="s">, attempt </span><span class="si">{</span><span class="n">attempt</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                    <span class="n">time</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="mi">10</span> <span class="o">*</span> <span class="p">(</span><span class="n">attempt</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>
                <span class="k">else</span><span class="p">:</span>
                    <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Couldn</span><span class="sh">'</span><span class="s">t get instance data </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="s"> adter </span><span class="si">{</span><span class="n">retries</span><span class="si">}</span><span class="s"> attempts: </span><span class="si">{</span><span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                    <span class="k">return</span> <span class="bp">False</span>
        
        <span class="n">reservations</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">Reservations</span><span class="sh">'</span><span class="p">,</span> <span class="p">[])</span>
        <span class="k">if</span> <span class="ow">not</span> <span class="n">reservations</span> <span class="ow">or</span> <span class="nf">len</span><span class="p">(</span><span class="n">reservations</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">Instances</span><span class="sh">'</span><span class="p">,</span> <span class="p">[]))</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
            <span class="n">logger</span><span class="p">.</span><span class="nf">warning</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">The instance was not found: </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
            <span class="k">return</span> <span class="bp">False</span>
        
        <span class="n">instance</span> <span class="o">=</span> <span class="n">reservations</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="sh">'</span><span class="s">Instances</span><span class="sh">'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
        
        <span class="n">instance_tags</span> <span class="o">=</span> <span class="p">{}</span>
        <span class="k">for</span> <span class="n">tag</span> <span class="ow">in</span> <span class="n">instance</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">'</span><span class="s">Tags</span><span class="sh">'</span><span class="p">,</span> <span class="p">[]):</span>
            <span class="n">instance_tags</span><span class="p">[</span><span class="n">tag</span><span class="p">[</span><span class="sh">'</span><span class="s">Key</span><span class="sh">'</span><span class="p">]]</span> <span class="o">=</span> <span class="n">tag</span><span class="p">[</span><span class="sh">'</span><span class="s">Value</span><span class="sh">'</span><span class="p">]</span>
        
        <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Current instance tags </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">json</span><span class="p">.</span><span class="nf">dumps</span><span class="p">(</span><span class="n">instance_tags</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        
        <span class="k">if</span> <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span> <span class="ow">in</span> <span class="n">instance_tags</span><span class="p">:</span>
            <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Instance </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="s"> already has the Name tag: </span><span class="si">{</span><span class="n">instance_tags</span><span class="p">[</span><span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
            <span class="k">return</span> <span class="bp">False</span>
            
        <span class="n">cluster_name</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="k">for</span> <span class="n">key</span> <span class="ow">in</span> <span class="n">instance_tags</span><span class="p">.</span><span class="nf">keys</span><span class="p">():</span>
            <span class="k">if</span> <span class="n">key</span><span class="p">.</span><span class="nf">startswith</span><span class="p">(</span><span class="sh">'</span><span class="s">kubernetes.io/cluster/</span><span class="sh">'</span><span class="p">):</span>
                <span class="n">cluster_name</span> <span class="o">=</span> <span class="n">key</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sh">'</span><span class="s">/</span><span class="sh">'</span><span class="p">,</span> <span class="mi">2</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Cluster tag found: </span><span class="si">{</span><span class="n">key</span><span class="si">}</span><span class="s">=</span><span class="si">{</span><span class="n">instance_tags</span><span class="p">[</span><span class="n">key</span><span class="p">]</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                <span class="k">break</span>
        
        <span class="k">if</span> <span class="n">cluster_name</span><span class="p">:</span>
            <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Adding the tag Name=EKS-</span><span class="si">{</span><span class="n">cluster_name</span><span class="si">}</span><span class="s"> for the instance </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
            <span class="k">try</span><span class="p">:</span>
                <span class="n">ec2_client</span><span class="p">.</span><span class="nf">create_tags</span><span class="p">(</span>
                    <span class="n">Resources</span><span class="o">=</span><span class="p">[</span><span class="n">instance_id</span><span class="p">],</span>
                    <span class="n">Tags</span><span class="o">=</span><span class="p">[{</span><span class="sh">'</span><span class="s">Key</span><span class="sh">'</span><span class="p">:</span> <span class="sh">'</span><span class="s">Name</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">Value</span><span class="sh">'</span><span class="p">:</span> <span class="sa">f</span><span class="sh">'</span><span class="s">EKS-</span><span class="si">{</span><span class="n">cluster_name</span><span class="si">}</span><span class="sh">'</span><span class="p">}]</span>
                <span class="p">)</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Tag Name=EKS-</span><span class="si">{</span><span class="n">cluster_name</span><span class="si">}</span><span class="s"> successfully added for instance </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                <span class="k">return</span> <span class="bp">True</span>
            <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
                <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Error when creating the tag: </span><span class="si">{</span><span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                <span class="k">return</span> <span class="bp">False</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">For the instance </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="s"> cluster EKS tags not found</span><span class="sh">"</span><span class="p">)</span>
            <span class="k">return</span> <span class="bp">False</span>
            
    <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
        <span class="n">logger</span><span class="p">.</span><span class="nf">error</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Unexpected error during instance processing </span><span class="si">{</span><span class="n">instance_id</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="nf">str</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="bp">False</span>
</code></pre></div></div>

<p>Create a ZIP archive with the function:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>zip lambda_function.zip lambda_function.py
</code></pre></div></div>

<p>Create a Lambda function in the AWS CLI:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws lambda create-function <span class="se">\</span>
  <span class="nt">--function-name</span> ec2-auto-tagging <span class="se">\</span>
  <span class="nt">--runtime</span> python3.11 <span class="se">\</span>
  <span class="nt">--zip-file</span> fileb://lambda_function.zip <span class="se">\</span>
  <span class="nt">--handler</span> lambda_function.lambda_handler <span class="se">\</span>
  <span class="nt">--role</span> arn:aws:iam::<span class="k">${</span><span class="nv">ACCOUNT_ID</span><span class="k">}</span>:role/LambdaEC2TaggingRole <span class="se">\</span>
  <span class="nt">--timeout</span> 60
</code></pre></div></div>

<h2 id="step-3-configure-eventbridge-and-cloudtrail">Step 3. Configure EventBridge and CloudTrail</h2>

<p>Create an EventBridge rule that triggers Lambda when creating new EC2 instances.:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws events put-rule <span class="se">\</span>
  <span class="nt">--name</span> <span class="s2">"trigger-on-ec2-instance-creation"</span> <span class="se">\</span>
  <span class="nt">--event-pattern</span> <span class="s1">'{
    "source": ["aws.ec2"],
    "detail-type": ["AWS API Call via CloudTrail"],
    "detail": {
      "eventSource": ["ec2.amazonaws.com"],
      "eventName": ["RunInstances"]
    }
  }'</span>
</code></pre></div></div>

<h3 id="configuring-cloudtrail-to-record-api-events">Configuring CloudTrail to record API events</h3>

<p>CloudTrail is necessary for the system to receive instance startup events. If CloudTrail is not configured, follow these steps:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">export </span><span class="nv">S3_BUCKET</span><span class="o">=</span>cloudtrail-logs-<span class="si">$(</span>aws sts get-caller-identity <span class="nt">--query</span> <span class="s2">"Account"</span> <span class="nt">--output</span> text<span class="si">)</span>
<span class="nb">export </span><span class="nv">ACCOUNT_ID</span><span class="o">=</span><span class="si">$(</span>aws sts get-caller-identity <span class="nt">--query</span> <span class="s2">"Account"</span> <span class="nt">--output</span> text<span class="si">)</span>
<span class="nb">export </span><span class="nv">REGION</span><span class="o">=</span><span class="si">$(</span>aws configure get region<span class="si">)</span>

aws s3 mb s3://<span class="nv">$S3_BUCKET</span> <span class="nt">--region</span> <span class="nv">$REGION</span>

<span class="nb">cat</span> <span class="o">&gt;</span> bucket-policy.json <span class="o">&lt;&lt;</span> <span class="no">EOF</span><span class="sh">
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AWSCloudTrailAclCheck",
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudtrail.amazonaws.com"
            },
            "Action": "s3:GetBucketAcl",
            "Resource": "arn:aws:s3:::</span><span class="k">${</span><span class="nv">S3_BUCKET</span><span class="k">}</span><span class="sh">"
        },
        {
            "Sid": "AWSCloudTrailWrite",
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudtrail.amazonaws.com"
            },
            "Action": "s3:PutObject",
            "Resource": "arn:aws:s3:::</span><span class="k">${</span><span class="nv">S3_BUCKET</span><span class="k">}</span><span class="sh">/AWSLogs/</span><span class="k">${</span><span class="nv">ACCOUNT_ID</span><span class="k">}</span><span class="sh">/*",
            "Condition": {
                "StringEquals": {
                    "s3:x-amz-acl": "bucket-owner-full-control"
                }
            }
        }
    ]
}
</span><span class="no">EOF

</span>aws s3api put-bucket-policy <span class="nt">--bucket</span> <span class="nv">$S3_BUCKET</span> <span class="nt">--policy</span> file://bucket-policy.json

aws cloudtrail create-trail <span class="se">\</span>
  <span class="nt">--name</span> api-events-trail <span class="se">\</span>
  <span class="nt">--s3-bucket-name</span> <span class="nv">$S3_BUCKET</span> <span class="se">\</span>
  <span class="nt">--is-multi-region-trail</span> <span class="se">\</span>
  <span class="nt">--enable-log-file-validation</span>

aws cloudtrail start-logging <span class="nt">--name</span> api-events-trail

aws cloudtrail put-event-selectors <span class="se">\</span>
  <span class="nt">--trail-name</span> api-events-trail <span class="se">\</span>
  <span class="nt">--event-selectors</span> <span class="s1">'[{"ReadWriteType": "All", "IncludeManagementEvents": true}]'</span>
</code></pre></div></div>

<h3 id="add-the-lambda-permission-and-configure-the-eventbridge">Add the Lambda permission and configure the EventBridge</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws lambda add-permission <span class="se">\</span>
  <span class="nt">--function-name</span> ec2-auto-tagging <span class="se">\</span>
  <span class="nt">--statement-id</span> AllowEventBridgeInvoke <span class="se">\</span>
  <span class="nt">--action</span> lambda:InvokeFunction <span class="se">\</span>
  <span class="nt">--principal</span> events.amazonaws.com <span class="se">\</span>
  <span class="nt">--source-arn</span> arn:aws:events:<span class="k">${</span><span class="nv">REGION</span><span class="k">}</span>:<span class="k">${</span><span class="nv">ACCOUNT_ID</span><span class="k">}</span>:rule/trigger-on-ec2-instance-creation

aws events put-targets <span class="se">\</span>
  <span class="nt">--rule</span> trigger-on-ec2-instance-creation <span class="se">\</span>
  <span class="nt">--targets</span> <span class="s1">'[{"Id": "1", "Arn": "arn:aws:lambda:'</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s1">':'</span><span class="k">${</span><span class="nv">ACCOUNT_ID</span><span class="k">}</span><span class="s1">':function:ec2-auto-tagging"}]'</span>
</code></pre></div></div>

<h2 id="step-4-checking-the-work">Step 4. Checking the work</h2>

<p>Create a new EC2 instance with a tag like:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws ec2 run-instances <span class="se">\</span>
  <span class="nt">--image-id</span> ami-XXXXXXXXXX <span class="se">\</span>
  <span class="nt">--count</span> 1 <span class="se">\</span>
  <span class="nt">--instance-type</span> t2.micro <span class="se">\</span>
  <span class="nt">--subnet-id</span> subnet-XXXXXXXXXX <span class="se">\</span>
  <span class="nt">--tag-specifications</span> <span class="s1">'ResourceType=instance,Tags=[{Key=kubernetes.io/cluster/my-cluster,Value=owned}]'</span>
</code></pre></div></div>

<p>After starting the instance, make sure that a new tag is automatically added:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">INSTANCE_ID</span><span class="o">=</span>i-XXXXXXXXXX

<span class="nb">sleep </span>60

aws ec2 describe-tags <span class="nt">--filters</span> <span class="s2">"Name=resource-id,Values=</span><span class="k">${</span><span class="nv">INSTANCE_ID</span><span class="k">}</span><span class="s2">"</span> <span class="nt">--query</span> <span class="s2">"Tags[?Key=='Name']"</span>
</code></pre></div></div>

<p>The tag should appear:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
  </span><span class="p">{</span><span class="w">
    </span><span class="nl">"Key"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Name"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"ResourceId"</span><span class="p">:</span><span class="w"> </span><span class="s2">"i-XXXXXXXXXX"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"ResourceType"</span><span class="p">:</span><span class="w"> </span><span class="s2">"instance"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"Value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"EKS-my-cluster"</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>

<h2 id="step-5-tag-existing-ec2-instances">Step 5. Tag existing ec2 instances</h2>

<p>After all the actions above, at the time of creation, an tag with the cluster name will be created on the instance, but right now there are instances created without tags, let’s tag them too:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="nb">set</span> <span class="nt">-e</span>

<span class="k">if</span> <span class="o">!</span> <span class="nb">command</span> <span class="nt">-v</span> aws &amp;&gt; /dev/null<span class="p">;</span> <span class="k">then
    </span><span class="nb">echo</span> <span class="s2">"Error: AWS CLI is not installed. Please install it and configure it."</span>
    <span class="nb">exit </span>1
<span class="k">fi

</span><span class="nb">echo</span> <span class="s2">"Getting a list of all EKS clusters from existing EC2 instances..."</span>
<span class="nv">CLUSTER_NAMES</span><span class="o">=(</span><span class="si">$(</span>aws ec2 describe-instances <span class="se">\</span>
    <span class="nt">--filters</span> <span class="s2">"Name=tag-key,Values=aws:eks:cluster-name"</span> <span class="se">\</span>
    <span class="nt">--query</span> <span class="s2">"Reservations[].Instances[].Tags[?Key=='aws:eks:cluster-name'].Value"</span> <span class="se">\</span>
    <span class="nt">--output</span> text | <span class="nb">sort</span> | <span class="nb">uniq</span><span class="si">)</span><span class="o">)</span>

<span class="k">if</span> <span class="o">[</span> <span class="k">${#</span><span class="nv">CLUSTER_NAMES</span><span class="p">[@]</span><span class="k">}</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
    </span><span class="nb">echo</span> <span class="s2">"No instances were found with the aws:eks:cluster-name tag."</span>
    <span class="nb">exit </span>0
<span class="k">fi

</span><span class="nb">echo</span> <span class="s2">"The following EKS clusters were found:"</span>
<span class="k">for </span>CLUSTER <span class="k">in</span> <span class="s2">"</span><span class="k">${</span><span class="nv">CLUSTER_NAMES</span><span class="p">[@]</span><span class="k">}</span><span class="s2">"</span><span class="p">;</span> <span class="k">do
    </span><span class="nb">echo</span> <span class="s2">"- </span><span class="nv">$CLUSTER</span><span class="s2">"</span>
<span class="k">done

</span><span class="nv">TOTAL_INSTANCES</span><span class="o">=</span>0

<span class="k">for </span>CLUSTER_NAME <span class="k">in</span> <span class="s2">"</span><span class="k">${</span><span class="nv">CLUSTER_NAMES</span><span class="p">[@]</span><span class="k">}</span><span class="s2">"</span><span class="p">;</span> <span class="k">do
    </span><span class="nv">NAME_TAG_VALUE</span><span class="o">=</span><span class="s2">"EKS-</span><span class="k">${</span><span class="nv">CLUSTER_NAME</span><span class="k">}</span><span class="s2">"</span>
    
    <span class="nb">echo</span> <span class="s2">"========================================"</span>
    <span class="nb">echo</span> <span class="s2">"Processing cluster: </span><span class="k">${</span><span class="nv">CLUSTER_NAME</span><span class="k">}</span><span class="s2">"</span>
    <span class="nb">echo</span> <span class="s2">"Searching for EC2 instances with the aws tag:eks:cluster-name=</span><span class="k">${</span><span class="nv">CLUSTER_NAME</span><span class="k">}</span><span class="s2">..."</span>
    
    <span class="nv">INSTANCE_IDS</span><span class="o">=</span><span class="si">$(</span>aws ec2 describe-instances <span class="se">\</span>
        <span class="nt">--filters</span> <span class="s2">"Name=tag:aws:eks:cluster-name,Values=</span><span class="k">${</span><span class="nv">CLUSTER_NAME</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
        <span class="nt">--query</span> <span class="s2">"Reservations[].Instances[].InstanceId"</span> <span class="se">\</span>
        <span class="nt">--output</span> text<span class="si">)</span>
    
    <span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$INSTANCE_IDS</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"Instances with the tag aws:eks:cluster-name=</span><span class="k">${</span><span class="nv">CLUSTER_NAME</span><span class="k">}</span><span class="s2"> not found."</span>
        <span class="k">continue
    fi
    
    </span><span class="nv">INSTANCE_COUNT</span><span class="o">=</span><span class="si">$(</span><span class="nb">echo</span> <span class="nv">$INSTANCE_IDS</span> | <span class="nb">wc</span> <span class="nt">-w</span><span class="si">)</span>
    <span class="nb">echo</span> <span class="s2">"Found </span><span class="nv">$INSTANCE_COUNT</span><span class="s2"> instances for the cluster </span><span class="k">${</span><span class="nv">CLUSTER_NAME</span><span class="k">}</span><span class="s2">."</span>
    <span class="nv">TOTAL_INSTANCES</span><span class="o">=</span><span class="k">$((</span>TOTAL_INSTANCES <span class="o">+</span> INSTANCE_COUNT<span class="k">))</span>
    
    <span class="k">for </span>INSTANCE_ID <span class="k">in</span> <span class="nv">$INSTANCE_IDS</span><span class="p">;</span> <span class="k">do
        </span><span class="nb">echo</span> <span class="s2">"Adding a tag Name=</span><span class="k">${</span><span class="nv">NAME_TAG_VALUE</span><span class="k">}</span><span class="s2"> to the instance </span><span class="nv">$INSTANCE_ID</span><span class="s2">..."</span>
        
        <span class="nv">EXISTING_NAME_TAG</span><span class="o">=</span><span class="si">$(</span>aws ec2 describe-tags <span class="se">\</span>
            <span class="nt">--filters</span> <span class="s2">"Name=resource-id,Values=</span><span class="k">${</span><span class="nv">INSTANCE_ID</span><span class="k">}</span><span class="s2">"</span> <span class="s2">"Name=key,Values=Name"</span> <span class="se">\</span>
            <span class="nt">--query</span> <span class="s2">"Tags[0].Value"</span> <span class="se">\</span>
            <span class="nt">--output</span> text<span class="si">)</span>
        
        <span class="k">if</span> <span class="o">[</span> <span class="s2">"</span><span class="nv">$EXISTING_NAME_TAG</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">"None"</span> <span class="o">]</span> <span class="o">&amp;&amp;</span> <span class="o">[</span> <span class="o">!</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$EXISTING_NAME_TAG</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
            </span><span class="nb">echo</span> <span class="s2">" The instance already has the tag  Name=</span><span class="k">${</span><span class="nv">EXISTING_NAME_TAG</span><span class="k">}</span><span class="s2">. Updating it..."</span>
        <span class="k">fi
        
        </span>aws ec2 create-tags <span class="se">\</span>
            <span class="nt">--resources</span> <span class="s2">"</span><span class="nv">$INSTANCE_ID</span><span class="s2">"</span> <span class="se">\</span>
            <span class="nt">--tags</span> <span class="s2">"Key=Name,Value=</span><span class="k">${</span><span class="nv">NAME_TAG_VALUE</span><span class="k">}</span><span class="s2">"</span>
        
        <span class="nb">echo</span> <span class="s2">" The tag has been successfully added."</span>
    <span class="k">done
    
    </span><span class="nb">echo</span> <span class="s2">"Processing of the cluster </span><span class="k">${</span><span class="nv">CLUSTER_NAME</span><span class="k">}</span><span class="s2"> has been completed."</span>
<span class="k">done

</span><span class="nb">echo</span> <span class="s2">"========================================"</span>
<span class="nb">echo</span> <span class="s2">"The operation is completed. A total of </span><span class="nv">$TOTAL_INSTANCES</span><span class="s2"> instances were processed across all clusters."</span>
</code></pre></div></div>

<h2 id="debugging-when-problems-occur">Debugging when problems occur</h2>

<p>If the tags do not appear automatically, follow these steps for debugging:</p>

<h3 id="1-checking-lambda-logs">1. Checking Lambda logs</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">LOG_STREAM</span><span class="o">=</span><span class="si">$(</span>aws logs describe-log-streams <span class="se">\</span>
  <span class="nt">--log-group-name</span> /aws/lambda/ec2-auto-tagging <span class="se">\</span>
  <span class="nt">--order-by</span> LastEventTime <span class="se">\</span>
  <span class="nt">--descending</span> <span class="se">\</span>
  <span class="nt">--limit</span> 1 <span class="se">\</span>
  <span class="nt">--query</span> <span class="s1">'logStreams[0].logStreamName'</span> <span class="se">\</span>
  <span class="nt">--output</span> text<span class="si">)</span>

aws logs get-log-events <span class="se">\</span>
  <span class="nt">--log-group-name</span> /aws/lambda/ec2-auto-tagging <span class="se">\</span>
  <span class="nt">--log-stream-name</span> <span class="nv">$LOG_STREAM</span> <span class="se">\</span>
  <span class="nt">--limit</span> 20
</code></pre></div></div>

<h3 id="2-checking-eventbridge-settings">2. Checking EventBridge settings</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws events describe-rule <span class="nt">--name</span> trigger-on-ec2-instance-creation

aws events list-targets-by-rule <span class="nt">--rule</span> trigger-on-ec2-instance-creation
</code></pre></div></div>

<h3 id="3-checking-cloudtrail">3. Checking CloudTrail</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws cloudtrail get-trail-status <span class="nt">--name</span> api-events-trail

aws cloudtrail get-event-selectors <span class="nt">--trail-name</span> api-events-trail
</code></pre></div></div>

<h3 id="4-verifying-the-rights-of-the-iam-role">4. Verifying the rights of the IAM role</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws iam list-attached-role-policies <span class="nt">--role-name</span> LambdaEC2TaggingRole
</code></pre></div></div>

<h3 id="5-manual-lambda-testing-with-test-event">5. Manual Lambda testing with test event</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="o">&gt;</span> test-event.json <span class="o">&lt;&lt;</span> <span class="no">EOF</span><span class="sh">
{
  "version": "0",
  "id": "6a7e8feb-b491-4cf7-a9f1-bf3703467718",
  "detail-type": "AWS API Call via CloudTrail",
  "source": "aws.ec2",
  "account": "</span><span class="si">$(</span>aws sts get-caller-identity <span class="nt">--query</span> <span class="s2">"Account"</span> <span class="nt">--output</span> text<span class="si">)</span><span class="sh">",
  "time": "2021-12-03T17:31:20Z",
  "region": "</span><span class="si">$(</span>aws configure get region<span class="si">)</span><span class="sh">",
  "resources": [],
  "detail": {
    "eventSource": "ec2.amazonaws.com",
    "eventName": "RunInstances",
    "responseElements": {
      "instancesSet": {
        "items": [
          {
            "instanceId": "i-YOUR_INSTANCE_ID"
          }
        ]
      }
    }
  }
}
</span><span class="no">EOF

</span><span class="nv">INSTANCE_ID</span><span class="o">=</span>i-XXXXXXXXXX
<span class="nb">sed</span> <span class="nt">-i</span> <span class="s2">"s/i-YOUR_INSTANCE_ID/</span><span class="nv">$INSTANCE_ID</span><span class="s2">/g"</span> test-event.json

aws lambda invoke <span class="se">\</span>
  <span class="nt">--function-name</span> ec2-auto-tagging <span class="se">\</span>
  <span class="nt">--payload</span> fileb://test-event.json <span class="se">\</span>
  response.json

<span class="nb">cat </span>response.json
</code></pre></div></div>

<h3 id="6-the-main-causes-of-inactivity-and-their-solutions">6. The main causes of inactivity and their solutions</h3>

<p>EventBridge does not have Lambda as a target</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws events put-targets <span class="se">\</span>
    <span class="nt">--rule</span> trigger-on-ec2-instance-creation <span class="se">\</span>
    <span class="nt">--targets</span> <span class="s1">'[{"Id": "1", "Arn": "arn:aws:lambda:'</span><span class="k">${</span><span class="nv">REGION</span><span class="k">}</span><span class="s1">':'</span><span class="k">${</span><span class="nv">ACCOUNT_ID</span><span class="k">}</span><span class="s1">':function:ec2-auto-tagging"}]'</span>
</code></pre></div></div>

<p>Lambda does not have permission to receive events from EventBridge</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws lambda add-permission <span class="se">\</span>
    <span class="nt">--function-name</span> ec2-auto-tagging <span class="se">\</span>
    <span class="nt">--statement-id</span> AllowEventBridgeInvoke <span class="se">\</span>
    <span class="nt">--action</span> lambda:InvokeFunction <span class="se">\</span>
    <span class="nt">--principal</span> events.amazonaws.com <span class="se">\</span>
    <span class="nt">--source-arn</span> arn:aws:events:<span class="k">${</span><span class="nv">REGION</span><span class="k">}</span>:<span class="k">${</span><span class="nv">ACCOUNT_ID</span><span class="k">}</span>:rule/trigger-on-ec2-instance-creation
</code></pre></div></div>

<p>CloudTrail is not configured or enabled</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws cloudtrail start-logging <span class="nt">--name</span> api-events-trail
</code></pre></div></div>

<p>The Lambda function shuts down too quickly, without waiting for the tags to become available.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws lambda update-function-configuration <span class="se">\</span>
    <span class="nt">--function-name</span> ec2-auto-tagging <span class="se">\</span>
    <span class="nt">--timeout</span> 120
</code></pre></div></div>]]></content><author><name>Evgenii Zhuravlev</name></author><category term="AWS" /><category term="AWS" /><category term="EKS" /><category term="ec2" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">How to import all existing AWS resources into Terraform</title><link href="https://matrix-spec.github.io/terraform/2025/02/25/move-all-existing-AWS-resources-into-terraform.html" rel="alternate" type="text/html" title="How to import all existing AWS resources into Terraform" /><published>2025-02-25T00:00:00+00:00</published><updated>2025-02-25T00:00:00+00:00</updated><id>https://matrix-spec.github.io/terraform/2025/02/25/move-all-existing-AWS-resources-into-terraform</id><content type="html" xml:base="https://matrix-spec.github.io/terraform/2025/02/25/move-all-existing-AWS-resources-into-terraform.html"><![CDATA[<p><img src="/assets/images/posts/move-resources-into-terraform/terraform-gif.gif" alt="banner" /></p>

<h3 id="setup-your-aws-config-file">Setup your aws config file</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~/.aws/config
</code></pre></div></div>

<h3 id="install-terraformer">Install terraformer</h3>

<p>In current directory.
You can find out about all the installation methods from <a href="https://github.com/GoogleCloudPlatform/terraformer">GitHub repo Terraformer</a></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>brew <span class="nb">install </span>terraformer
</code></pre></div></div>

<h3 id="create-versiontf-file">Create version.tf file</h3>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">terraform</span> <span class="p">{</span>
  <span class="nx">required_providers</span> <span class="p">{</span>
    <span class="nx">aws</span> <span class="o">=</span> <span class="p">{</span>
      <span class="nx">source</span>  <span class="o">=</span> <span class="s2">"hashicorp/aws"</span>
      <span class="nx">version</span> <span class="o">=</span> <span class="s2">"~&gt; 5.0"</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="nx">provider</span> <span class="s2">"aws"</span> <span class="p">{</span>
  <span class="nx">region</span> <span class="o">=</span> <span class="s2">"ue-west-1"</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="init-terraform">Init terraform</h3>

<p>And check that you providers is correct</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>terraform init
terraform providers
</code></pre></div></div>

<h3 id="start-import">Start import</h3>

<p>Set specific resources or all resources in the account</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>terraformer import aws <span class="nt">--resources</span><span class="o">=</span>ec2_instance,ebs <span class="nt">--regions</span><span class="o">=</span>eu-west-1 <span class="c"># for some resources</span>
terraformer import aws <span class="nt">--resources</span><span class="o">=</span><span class="s2">"*"</span> <span class="nt">--regions</span><span class="o">=</span>eu-west-1 <span class="c"># for all resources</span>
</code></pre></div></div>

<p>At this stage you may encounter errors like this. This means that in your AWS account does not have these entities, or Terraformer cannot connect to it with account that was transferred to the local AWS config:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>panic: runtime error: index out of range <span class="o">[</span>0] with length 0
...
...
...
aws error initializing resources <span class="k">in </span>service cloud9, err: operation error Cloud9: ListEnvironments, https response error StatusCode: 400
</code></pre></div></div>

<p>Then you need use <code class="language-plaintext highlighter-rouge">--excludes</code> parameter, passing resources that cause errors:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>terraformer import aws <span class="nt">--resources</span><span class="o">=</span><span class="s2">"*"</span> <span class="nt">--excludes</span><span class="o">=</span><span class="s2">"cloud9,identitystore"</span> <span class="nt">--regions</span><span class="o">=</span>eu-west-1
</code></pre></div></div>

<p>Terraformer by default separates each resource into a file, which is put into a current service directory. The default path for resource files is <code class="language-plaintext highlighter-rouge">{generated}/{provider}/{service}/{resource}.tf</code> and can vary for each provider.</p>

<h3 id="after-import">After import</h3>

<h4 id="clean-uo-non-existing-resources">Clean uo non-existing resources</h4>

<p>Inside <code class="language-plaintext highlighter-rouge">{generated}/{provider}/{service}/</code> may be file <code class="language-plaintext highlighter-rouge">terraform.tfstate </code> with empty resource. It is possible that you don’t need such resources. Then delete the directories with it:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
    <span class="s2">"version"</span><span class="o">:</span> <span class="mi">3</span><span class="p">,</span>
    <span class="s2">"terraform_version"</span><span class="o">:</span> <span class="s2">"0.12.31"</span><span class="p">,</span>
    <span class="s2">"serial"</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span>
    <span class="s2">"lineage"</span><span class="o">:</span> <span class="s2">"*****-fe52-*****-6e0f-51f858*****"</span><span class="p">,</span>
    <span class="s2">"modules"</span><span class="o">:</span> <span class="p">[</span>
        <span class="p">{</span>
            <span class="s2">"path"</span><span class="o">:</span> <span class="p">[</span>
                <span class="s2">"root"</span>
            <span class="p">],</span>
            <span class="s2">"outputs"</span><span class="o">:</span> <span class="p">{},</span>
            <span class="s2">"resources"</span><span class="o">:</span> <span class="p">{},</span>
            <span class="s2">"depends_on"</span><span class="o">:</span> <span class="p">[]</span>
        <span class="p">}</span>
    <span class="p">]</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="check-providertf-file">Check provider.tf file</h4>

<p>Check that the directories have <code class="language-plaintext highlighter-rouge">provider.tf file</code>. It may look like this, the data in file must match your version of terraform and your current providers:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">provider</span> <span class="s2">"aws"</span> <span class="p">{</span>
  <span class="nx">region</span> <span class="o">=</span> <span class="s2">"eu-west-1"</span>
<span class="p">}</span>

<span class="nx">terraform</span> <span class="p">{</span>
	<span class="nx">required_providers</span> <span class="p">{</span>
		<span class="nx">aws</span> <span class="o">=</span> <span class="p">{</span>
	    <span class="nx">version</span> <span class="o">=</span> <span class="s2">"~&gt; 5.88.0"</span>
		<span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<h4 id="replace-tfstates-versions">Replace tfstates versions</h4>

<p>Now, you have all you resources that you created in AWS both manually in directory <code class="language-plaintext highlighter-rouge">{generated}/{provider}/</code>! Super!</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">.</span>
..
acm
alb
auto_scaling
cloudformation
cloudfront
cloudwatch
cognito
...
</code></pre></div></div>

<p>Please note that all files <code class="language-plaintext highlighter-rouge">{generated}/{provider}/{service}/terraform.tfstate</code> containsuch a header:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span>
    <span class="s2">"version"</span><span class="o">:</span> <span class="mi">3</span><span class="p">,</span>
    <span class="s2">"terraform_version"</span><span class="o">:</span> <span class="s2">"0.12.31"</span><span class="p">,</span>
    <span class="s2">"serial"</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span>
    <span class="s2">"lineage"</span><span class="o">:</span> <span class="s2">"******-e4dd-0626-*****-9ceccb5a928a"</span><span class="p">,</span>
    <span class="s2">"modules"</span><span class="o">:</span> <span class="p">[</span>
</code></pre></div></div>

<p>It means that Terraformer based on <strong>Terraform version</strong> <code class="language-plaintext highlighter-rouge">0.12.+</code> for which file <code class="language-plaintext highlighter-rouge">terraform.tfstate</code> created with <code class="language-plaintext highlighter-rouge">"version": 3</code>. This is a problem, because you current and next versions terraform is newer. For fix it do next command in every directory <code class="language-plaintext highlighter-rouge">{generated}/{provider}/{service}/</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>terraform state replace-provider <span class="nt">-auto-approve</span> <span class="s2">"registry.terraform.io/-/aws"</span> <span class="s2">"hashicorp/aws"</span>
</code></pre></div></div>

<p>Read more about it here <a href="https://developer.hashicorp.com/terraform/cli/commands/state/replace-provider">Terraform state replace-provider</a></p>

<h4 id="create-backendtf-for-remote-state">Create backend.tf for remote state</h4>

<p>In each directory create backend.tf files for remote state</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">terraform</span> <span class="p">{</span>
  <span class="nx">backend</span> <span class="s2">"s3"</span> <span class="p">{</span>
    <span class="nx">bucket</span>         <span class="o">=</span> <span class="s2">"$BACKET_NAME"</span>
    <span class="nx">key</span>            <span class="o">=</span> <span class="s2">"$KEYNAME/terraform.tfstate"</span>
    <span class="nx">region</span>         <span class="o">=</span> <span class="s2">"eu-west-1"</span>
    <span class="nx">dynamodb_table</span> <span class="o">=</span> <span class="s2">"$DYNAMO_DB_LOCKS_TABLE"</span> <span class="c1"># if you need it</span>
    <span class="nx">encrypt</span>        <span class="o">=</span> <span class="kc">true</span>
  <span class="p">}</span>
<span class="p">}</span> 
</code></pre></div></div>

<h4 id="start-init-terraform">Start init terraform</h4>

<p>In each directory start init terraform</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>terraform init <span class="nt">-force-copy</span>
</code></pre></div></div>

<h4 id="script-for-all-the-previous-steps">Script for all the previous steps</h4>

<p>Fix it for yourself, but the logic is correct</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="c"># Checking for required environment variables</span>
<span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$AWS_REGION</span><span class="s2">"</span> <span class="o">]</span> <span class="o">||</span> <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$TF_STATE_BUCKET</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
    </span><span class="nb">echo</span> <span class="s2">"Please set the following environment variables:"</span>
    <span class="nb">echo</span> <span class="s2">"export AWS_REGION='your-region'"</span>
    <span class="nb">echo</span> <span class="s2">"export TF_STATE_BUCKET='your-bucket-name'"</span>
    <span class="nb">exit </span>1
<span class="k">fi</span>

<span class="c"># Save current directory</span>
<span class="nv">SCRIPT_DIR</span><span class="o">=</span><span class="si">$(</span><span class="nb">pwd</span><span class="si">)</span>
<span class="nv">BASE_DIR</span><span class="o">=</span><span class="s2">"terraformer/generated/aws"</span>

<span class="c"># Get list of all services</span>
<span class="k">for </span>SERVICE_DIR <span class="k">in</span> <span class="nv">$BASE_DIR</span>/<span class="k">*</span><span class="p">;</span> <span class="k">do
    if</span> <span class="o">[</span> <span class="o">!</span> <span class="nt">-d</span> <span class="s2">"</span><span class="nv">$SERVICE_DIR</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        continue
    fi

    </span><span class="nv">SERVICE_NAME</span><span class="o">=</span><span class="si">$(</span><span class="nb">basename</span> <span class="s2">"</span><span class="nv">$SERVICE_DIR</span><span class="s2">"</span><span class="si">)</span>
    <span class="nb">echo</span> <span class="s2">"========================================="</span>
    <span class="nb">echo</span> <span class="s2">"Testing migration for service </span><span class="nv">$SERVICE_NAME</span><span class="s2">..."</span>
    <span class="nb">echo</span> <span class="s2">"========================================="</span>

    <span class="nb">cd</span> <span class="s2">"</span><span class="nv">$SERVICE_DIR</span><span class="s2">"</span>

    <span class="nb">echo</span> <span class="s2">"2. Replacing provider..."</span>
    terraform state replace-provider <span class="nt">-auto-approve</span> <span class="s2">"registry.terraform.io/-/aws"</span> <span class="s2">"hashicorp/aws"</span>

    <span class="nb">echo</span> <span class="s2">"3. Creating backend.tf from template..."</span>
    <span class="nv">TEMPLATE_PATH</span><span class="o">=</span><span class="s2">"</span><span class="nv">$SCRIPT_DIR</span><span class="s2">/backend.tf.tmpl"</span>

    <span class="k">if</span> <span class="o">[</span> <span class="o">!</span> <span class="nt">-f</span> <span class="s2">"</span><span class="nv">$TEMPLATE_PATH</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"Error: Template file </span><span class="nv">$TEMPLATE_PATH</span><span class="s2"> not found!"</span>
        <span class="nb">exit </span>1
    <span class="k">fi

    </span><span class="nb">echo</span> <span class="s2">"Using template: </span><span class="nv">$TEMPLATE_PATH</span><span class="s2">"</span>
    <span class="nb">echo</span> <span class="s2">"Creating backend.tf in: </span><span class="si">$(</span><span class="nb">pwd</span><span class="si">)</span><span class="s2">"</span>

    <span class="nb">cat</span> <span class="s2">"</span><span class="nv">$TEMPLATE_PATH</span><span class="s2">"</span> | <span class="se">\</span>
    <span class="nb">sed</span> <span class="s2">"s/__SERVICE__/</span><span class="nv">$SERVICE_NAME</span><span class="s2">/g"</span> | <span class="se">\</span>
    <span class="nb">sed</span> <span class="s2">"s/your-aws-region/</span><span class="nv">$AWS_REGION</span><span class="s2">/g"</span> | <span class="se">\</span>
    <span class="nb">sed</span> <span class="s2">"s/your-terraform-state-bucket/</span><span class="nv">$TF_STATE_BUCKET</span><span class="s2">/g"</span> <span class="o">&gt;</span> backend.tf

    <span class="k">if</span> <span class="o">[</span> <span class="o">!</span> <span class="nt">-s</span> backend.tf <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"Error: backend.tf is empty after creation!"</span>
        <span class="nb">exit </span>1
    <span class="k">fi

    </span><span class="nb">echo</span> <span class="s2">"Contents of created backend.tf:"</span>
    <span class="nb">cat </span>backend.tf

    <span class="nb">echo</span> <span class="s2">"4. Migrating state to S3..."</span>
    terraform init <span class="se">\</span>
        <span class="nt">-force-copy</span> <span class="se">\</span>
        <span class="nt">-backend</span><span class="o">=</span><span class="nb">true</span> <span class="se">\</span>
        <span class="nt">-backend-config</span><span class="o">=</span><span class="s2">"bucket=</span><span class="nv">$TF_STATE_BUCKET</span><span class="s2">"</span> <span class="se">\</span>
        <span class="nt">-backend-config</span><span class="o">=</span><span class="s2">"key=</span><span class="nv">$YOURS_BACKEND_S3</span><span class="s2">/</span><span class="nv">$SERVICE_NAME</span><span class="s2">/terraform.tfstate"</span> <span class="se">\</span>
        <span class="nt">-backend-config</span><span class="o">=</span><span class="s2">"region=</span><span class="nv">$AWS_REGION</span><span class="s2">"</span> <span class="se">\</span>
        <span class="nt">-backend-config</span><span class="o">=</span><span class="s2">"dynamodb_table=</span><span class="nv">$NAME</span><span class="s2">"</span> <span class="se">\</span>
        <span class="nt">-backend-config</span><span class="o">=</span><span class="s2">"encrypt=true"</span>

    <span class="c"># Status check</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"5. Checking state..."</span>
        terraform state list
    <span class="k">else
        </span><span class="nb">echo</span> <span class="s2">"Error during state migration for </span><span class="nv">$SERVICE_NAME</span><span class="s2">!"</span>
        <span class="c"># Continue with next service instead of exiting</span>
        <span class="nb">cd</span> <span class="s2">"</span><span class="nv">$SCRIPT_DIR</span><span class="s2">"</span>
        <span class="k">continue
    fi</span>

    <span class="c"># Return to original directory for next iteration</span>
    <span class="nb">cd</span> <span class="s2">"</span><span class="nv">$SCRIPT_DIR</span><span class="s2">"</span>
    <span class="nb">echo</span> <span class="s2">"----------------------------------------"</span>
    <span class="nb">echo</span> <span class="s2">"Migration of </span><span class="nv">$SERVICE_NAME</span><span class="s2"> completed!"</span>
    <span class="nb">echo</span> <span class="s2">"----------------------------------------"</span>
<span class="k">done

</span><span class="nb">echo</span> <span class="s2">"Migration of all services completed!"</span> 
</code></pre></div></div>

<h3 id="after-init">After init</h3>

<p>At this stage you will have all resources states in bucket. For example, you S3 may look like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aws s3 <span class="nb">ls </span>s3://<span class="nv">$BACKET_NAME</span>/<span class="nv">$KEYNAME</span> <span class="nt">--recursive</span> | <span class="nb">awk</span> <span class="s1">'{print $4}'</span>
<span class="nv">$BACKET_NAME</span>/acm/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/alb/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/auto_scaling/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/cloudformation/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/cloudfront/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/cloudwatch/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/config/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/docdb/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/dynamodb/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/ebs/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/ec2_instance/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/ecr/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/efs/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/eip/terraform.tfstate
<span class="nv">$BACKET_NAME</span>/eks/terraform.tfstate
...
</code></pre></div></div>

<h4 id="rewrite-variables">Rewrite variables</h4>

<p>Now you files variables.tf contain local links for outputs to other resources. For example:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">data</span> <span class="s2">"terraform_remote_state"</span> <span class="s2">"sg"</span> <span class="p">{</span>
  <span class="nx">backend</span> <span class="o">=</span> <span class="s2">"local"</span>

  <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span>
    <span class="nx">path</span> <span class="o">=</span> <span class="s2">"../../../generated/aws/sg/terraform.tfstate"</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="nx">data</span> <span class="s2">"terraform_remote_state"</span> <span class="s2">"subnet"</span> <span class="p">{</span>
  <span class="nx">backend</span> <span class="o">=</span> <span class="s2">"local"</span>

  <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span>
    <span class="nx">path</span> <span class="o">=</span> <span class="s2">"../../../generated/aws/subnet/terraform.tfstate"</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You need rewrite all files to this:</p>

<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">data</span> <span class="s2">"terraform_remote_state"</span> <span class="s2">"sg"</span> <span class="p">{</span>
  <span class="nx">backend</span> <span class="o">=</span> <span class="s2">"s3"</span>

  <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span>
    <span class="nx">bucket</span> <span class="o">=</span> <span class="s2">"$BACKET_NAME"</span>
    <span class="nx">key</span>    <span class="o">=</span> <span class="s2">"$KEYNAME"</span>
    <span class="nx">region</span> <span class="o">=</span> <span class="s2">"eu-west-1"</span>
  <span class="p">}</span>
<span class="p">}</span>

<span class="nx">data</span> <span class="s2">"terraform_remote_state"</span> <span class="s2">"subnet"</span> <span class="p">{</span>
  <span class="nx">backend</span> <span class="o">=</span> <span class="s2">"s3"</span>

  <span class="nx">config</span> <span class="o">=</span> <span class="p">{</span>
    <span class="nx">bucket</span> <span class="o">=</span> <span class="s2">"$BACKET_NAME"</span>
    <span class="nx">key</span>    <span class="o">=</span> <span class="s2">"$KEYNAME"</span>
    <span class="nx">region</span> <span class="o">=</span> <span class="s2">"eu-west-1"</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>This script can help you:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="c"># Script for replacing local backend with remote S3 backend in variables.tf files</span>

<span class="c"># S3 bucket settings</span>
<span class="nv">S3_BUCKET</span><span class="o">=</span><span class="s2">"</span><span class="nv">$TF_STATE_BUCKET</span><span class="s2">"</span>
<span class="nv">REGION</span><span class="o">=</span><span class="s2">"</span><span class="nv">$AWS_REGION</span><span class="s2">"</span>
<span class="nv">KEY_PREFIX</span><span class="o">=</span><span class="s2">"</span><span class="nv">$TF_STATE_KEY_PREFIX</span><span class="s2">"</span>

<span class="c"># Infrastructure directory (for macOS)</span>
<span class="nv">INFRA_DIR</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span><span class="nb">cd</span> <span class="s2">"</span><span class="si">$(</span><span class="nb">dirname</span> <span class="s2">"</span><span class="nv">$0</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span> <span class="o">&amp;&amp;</span> <span class="nb">pwd</span><span class="si">)</span><span class="s2">"</span>
<span class="nb">echo</span> <span class="s2">"Working directory: </span><span class="nv">$INFRA_DIR</span><span class="s2">"</span>

<span class="c"># Function to replace backend in file</span>
replace_backend<span class="o">()</span> <span class="o">{</span>
    <span class="nb">local </span><span class="nv">file</span><span class="o">=</span><span class="s2">"</span><span class="nv">$1</span><span class="s2">"</span>
    <span class="nb">local </span><span class="nv">service_name</span><span class="o">=</span><span class="s2">"</span><span class="nv">$2</span><span class="s2">"</span>
    
    <span class="nb">echo</span> <span class="s2">"Processing file: </span><span class="nv">$file</span><span class="s2"> for service: </span><span class="nv">$service_name</span><span class="s2">"</span>
    
    <span class="c"># Create temporary file</span>
    <span class="nb">local </span><span class="nv">temp_file</span><span class="o">=</span><span class="si">$(</span><span class="nb">mktemp</span><span class="si">)</span>
    
    <span class="c"># Completely rewrite the file, fixing all terraform_remote_state blocks</span>
    <span class="c"># Use sed to extract names of all remote_state blocks</span>
    <span class="nv">remote_states</span><span class="o">=</span><span class="si">$(</span><span class="nb">grep</span> <span class="nt">-o</span> <span class="s1">'data "terraform_remote_state" "[^"]*"'</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span> | <span class="nb">awk</span> <span class="nt">-F</span><span class="s1">'"'</span> <span class="s1">'{print $4}'</span><span class="si">)</span>
    
    <span class="c"># If there are no remote_state blocks, skip the file</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nt">-z</span> <span class="s2">"</span><span class="nv">$remote_states</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"No terraform_remote_state blocks found in </span><span class="nv">$file</span><span class="s2">"</span>
        <span class="nb">rm</span> <span class="s2">"</span><span class="nv">$temp_file</span><span class="s2">"</span>
        <span class="k">return
    fi</span>
    
    <span class="c"># Create new file with correct blocks</span>
    <span class="o">&gt;</span> <span class="s2">"</span><span class="nv">$temp_file</span><span class="s2">"</span>
    
    <span class="k">for </span>rs <span class="k">in</span> <span class="nv">$remote_states</span><span class="p">;</span> <span class="k">do
        </span><span class="nb">echo</span> <span class="s2">"Processing remote_state block: </span><span class="nv">$rs</span><span class="s2">"</span>
        
        <span class="c"># Add block with correct S3 configuration</span>
        <span class="nb">cat</span> <span class="o">&gt;&gt;</span> <span class="s2">"</span><span class="nv">$temp_file</span><span class="s2">"</span> <span class="o">&lt;&lt;</span> <span class="no">EOF</span><span class="sh">
data "terraform_remote_state" "</span><span class="nv">$rs</span><span class="sh">" {
  backend = "s3"

  config = {
    bucket = "</span><span class="nv">$S3_BUCKET</span><span class="sh">"
    key    = "</span><span class="nv">$KEY_PREFIX</span><span class="sh">/</span><span class="nv">$rs</span><span class="sh">/terraform.tfstate"
    region = "</span><span class="nv">$REGION</span><span class="sh">"
  }
}
</span><span class="no">
EOF
</span>    <span class="k">done</span>
    
    <span class="c"># Check if file was changed</span>
    <span class="k">if</span> <span class="o">!</span> cmp <span class="nt">-s</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$temp_file</span><span class="s2">"</span><span class="p">;</span> <span class="k">then</span>
        <span class="c"># Create backup copy</span>
        <span class="nb">cp</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span> <span class="s2">"</span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">.bak"</span>
        <span class="nb">echo</span> <span class="s2">"Backup created: </span><span class="k">${</span><span class="nv">file</span><span class="k">}</span><span class="s2">.bak"</span>
        
        <span class="c"># Replace file</span>
        <span class="nb">mv</span> <span class="s2">"</span><span class="nv">$temp_file</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span>
        <span class="nb">echo</span> <span class="s2">"File updated: </span><span class="nv">$file</span><span class="s2">"</span>
    <span class="k">else
        </span><span class="nb">rm</span> <span class="s2">"</span><span class="nv">$temp_file</span><span class="s2">"</span>
        <span class="nb">echo</span> <span class="s2">"File already up to date: </span><span class="nv">$file</span><span class="s2">"</span>
    <span class="k">fi</span>
<span class="o">}</span>

<span class="c"># Main loop</span>
<span class="nb">echo</span> <span class="s2">"Starting variables.tf files update..."</span>

<span class="c"># Process each subdirectory</span>
<span class="k">for </span><span class="nb">dir </span><span class="k">in</span> <span class="s2">"</span><span class="nv">$INFRA_DIR</span><span class="s2">"</span>/<span class="k">*</span><span class="p">;</span> <span class="k">do
    if</span> <span class="o">[</span> <span class="nt">-d</span> <span class="s2">"</span><span class="nv">$dir</span><span class="s2">"</span> <span class="o">]</span> <span class="o">&amp;&amp;</span> <span class="o">[</span> <span class="s2">"</span><span class="si">$(</span><span class="nb">basename</span> <span class="s2">"</span><span class="nv">$dir</span><span class="s2">"</span><span class="si">)</span><span class="s2">"</span> <span class="o">!=</span> <span class="s2">".git"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        </span><span class="nv">service_name</span><span class="o">=</span><span class="si">$(</span><span class="nb">basename</span> <span class="s2">"</span><span class="nv">$dir</span><span class="s2">"</span><span class="si">)</span>
        <span class="nv">variables_file</span><span class="o">=</span><span class="s2">"</span><span class="nv">$dir</span><span class="s2">/variables.tf"</span>
        
        <span class="nb">echo</span> <span class="s2">"Checking directory: </span><span class="nv">$dir</span><span class="s2">"</span>
        <span class="nb">echo</span> <span class="s2">"Service name: </span><span class="nv">$service_name</span><span class="s2">"</span>
        
        <span class="k">if</span> <span class="o">[</span> <span class="nt">-f</span> <span class="s2">"</span><span class="nv">$variables_file</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
            </span><span class="nb">echo</span> <span class="s2">"Found variables.tf file in </span><span class="nv">$dir</span><span class="s2">"</span>
            replace_backend <span class="s2">"</span><span class="nv">$variables_file</span><span class="s2">"</span> <span class="s2">"</span><span class="nv">$service_name</span><span class="s2">"</span>
        <span class="k">else
            </span><span class="nb">echo</span> <span class="s2">"variables.tf file not found in </span><span class="nv">$dir</span><span class="s2">"</span>
        <span class="k">fi
    fi
done

</span><span class="nb">echo</span> <span class="s2">"Update completed!"</span> 
</code></pre></div></div>

<h3 id="done-push-your-iac-in-git">Done! Push your IaC in git</h3>

<p>You may run <code class="language-plaintext highlighter-rouge">terraform plan</code> for checks. And prepare .gitignore file, and do <code class="language-plaintext highlighter-rouge">git init</code>, <code class="language-plaintext highlighter-rouge">git remote add origin...</code>, <code class="language-plaintext highlighter-rouge">git push</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Local .terraform directories</span>
<span class="k">**</span>/.terraform/<span class="k">*</span>

<span class="c"># .terraform.lock.hcl contains the exact versions of the providers and their hashes</span>
<span class="c"># It is recommended to save this file in the repository to ensure</span>
<span class="c"># Reproducibility of infrastructure and security</span>
<span class="c"># Uncomment the following line if you want to exclude this file (not recommended)</span>
<span class="c"># **/.terraform.lock.hcl</span>

<span class="c"># .tfstate files</span>
<span class="k">*</span>.tfstate
<span class="k">*</span>.tfstate.<span class="k">*</span>

<span class="k">*</span>.tfplan
<span class="k">**</span>/<span class="k">*</span>.bak
<span class="k">**</span>/backup.tfstate
<span class="k">*</span>.sh
<span class="k">**</span>/<span class="k">*</span>.sh

<span class="c"># Crash log files</span>
crash.log
crash.<span class="k">*</span>.log

<span class="c"># Exclude all .tfvars files, which are likely to contain sensitive data, such as</span>
<span class="c"># password, private keys, and other secrets. These should not be part of version </span>
<span class="c"># control as they are data points which are potentially sensitive and subject </span>
<span class="c"># to change depending on the environment.</span>
<span class="k">*</span>.tfvars
<span class="k">*</span>.tfvars.json

<span class="c"># Ignore override files as they are usually used to override resources locally and so</span>
<span class="c"># are not checked in</span>
override.tf
override.tf.json
<span class="k">*</span>_override.tf
<span class="k">*</span>_override.tf.json

<span class="c"># Ignore transient lock info files created by terraform apply</span>
.terraform.tfstate.lock.info

<span class="c"># Include override files you do wish to add to version control using negated pattern</span>
<span class="c"># !example_override.tf</span>

<span class="c"># Ignore CLI configuration files</span>
.terraformrc
terraform.rc

.DS_Store
<span class="k">*</span>.swp
<span class="k">*</span>.swo 
</code></pre></div></div>

<h3 id="breaking-down-large-tf-files-into-modules">Breaking down large .tf files into modules</h3>

<p>Here’s an example approach:</p>

<p>Let’s say we need to create a new NodeGroup for the <code class="language-plaintext highlighter-rouge">cluster1</code> (not the most relevant example since node groups are generally one logical piece, but we need to “cut out” the <code class="language-plaintext highlighter-rouge">cluster.cluster1</code> node groups from the general codebase this all clusters).</p>

<h4 id="important-considerations">Important Considerations</h4>

<ul>
  <li>
    <p><strong>Always create backups before making changes</strong></p>
  </li>
  <li>
    <p><strong>Remove resources from the state before deleting code</strong>, otherwise Terraform might try to delete actual resources in AWS</p>
  </li>
  <li>
    <p><strong>Check the plan after changes</strong> to ensure Terraform isn’t trying to create or delete resources</p>
  </li>
  <li>
    <p><strong>Be careful with dependencies</strong>:</p>

    <ul>
      <li>
        <p>If other resources depend on the node groups being removed, errors may occur</p>
      </li>
      <li>
        <p>In this case, you’ll also need to update dependent resources</p>
      </li>
      <li>
        <p>Update references to outputs if other modules reference outputs from this module</p>
      </li>
    </ul>
  </li>
</ul>

<p>After completing these steps, your main module will no longer manage node groups for the <code class="language-plaintext highlighter-rouge">cluster1</code>, and they will be fully controlled by the new module.</p>

<h4 id="step-by-step-process">Step-by-Step Process</h4>

<p>We’re moving node groups for cluster1 from:
<code class="language-plaintext highlighter-rouge">aws/eks/eks_node_group.tf</code>
To:
<code class="language-plaintext highlighter-rouge">aws/eks/cluster1/eks_node_group.tf</code></p>

<p>And “cutting out” only the cluster1 nodes from the common file.
Note: This code is not a perfect example! It’s just to illustrate the principle.
Create a backup of the current state <code class="language-plaintext highlighter-rouge">terraform state pull &gt; backup.tfstate</code>
Create the directory structure for the new module <code class="language-plaintext highlighter-rouge">mkdir -p aws/eks/cluster1</code></p>

<p>Create <code class="language-plaintext highlighter-rouge">aws/eks/cluster1/backend.tf</code>:</p>
<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">terraform</span> <span class="p">{</span>
  <span class="nx">backend</span> <span class="s2">"s3"</span> <span class="p">{</span>
    <span class="nx">bucket</span>         <span class="o">=</span> <span class="s2">"$BUCKET_NAME"</span>
    <span class="nx">key</span>            <span class="o">=</span> <span class="s2">"cluster1/terraform.tfstate"</span>
    <span class="nx">region</span>         <span class="o">=</span> <span class="s2">"eu-west-1"</span>
    <span class="nx">dynamodb_table</span> <span class="o">=</span> <span class="s2">"$NAME"</span>
    <span class="nx">encrypt</span>        <span class="o">=</span> <span class="kc">true</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Create <code class="language-plaintext highlighter-rouge">provider.tf</code> in the new directory:</p>
<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">provider</span> <span class="s2">"aws"</span> <span class="p">{</span>
  <span class="nx">region</span> <span class="o">=</span> <span class="s2">"eu-west-1"</span>
<span class="p">}</span>

<span class="nx">terraform</span> <span class="p">{</span>
  <span class="nx">required_providers</span> <span class="p">{</span>
    <span class="nx">aws</span> <span class="o">=</span> <span class="p">{</span>
      <span class="nx">version</span> <span class="o">=</span> <span class="s2">"~&gt; 5.88.0"</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>As needed, create <code class="language-plaintext highlighter-rouge">variables.tf</code> and <code class="language-plaintext highlighter-rouge">outputs.tf</code> in the new directory, and copy <code class="language-plaintext highlighter-rouge">.terraform.lock.hcl</code>.</p>

<p>Create <code class="language-plaintext highlighter-rouge">eks_node_group.tf</code> (copy only the nodes for the cluster1 cluster):</p>
<div class="language-hcl highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">resource</span> <span class="s2">"aws_eks_node_group"</span> <span class="s2">"common"</span> <span class="p">{</span>
  <span class="nx">ami_type</span>       <span class="o">=</span> <span class="s2">"xxxxx"</span>
  <span class="nx">capacity_type</span>  <span class="o">=</span> <span class="s2">"ON_DEMAND"</span>
  <span class="nx">cluster_name</span>   <span class="o">=</span> <span class="s2">"${aws_eks_cluster1.name}"</span>
  <span class="nx">disk_size</span>      <span class="o">=</span> <span class="s2">"100"</span>
  <span class="nx">instance_types</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"xxxxx"</span><span class="p">]</span>
  <span class="c1"># ...</span>
<span class="p">}</span>

<span class="c1"># Add your new node group</span>
<span class="nx">resource</span> <span class="s2">"aws_eks_node_group"</span> <span class="s2">"MY-SUPER-NODE-GROUP"</span> <span class="p">{</span>
  <span class="nx">ami_type</span>       <span class="o">=</span> <span class="s2">"xxxxx"</span>
  <span class="nx">capacity_type</span>  <span class="o">=</span> <span class="s2">"ON_DEMAND"</span>
  <span class="nx">cluster_name</span>   <span class="o">=</span> <span class="s2">"${aws_eks_cluster1.name}"</span>
  <span class="nx">disk_size</span>      <span class="o">=</span> <span class="s2">"100"</span>
  <span class="nx">instance_types</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"xxxxx"</span><span class="p">]</span>
  <span class="c1"># ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Initialize and move state:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Initialize the working directory, download required providers, </span>
<span class="c"># configure backend for state storage, and install modules if used</span>
terraform init

<span class="c"># Very important step - this will change the name in the remote state </span>
<span class="c"># to a new one without the "tfer--" prefix that was generated during export.</span>
<span class="c"># Do this for all resources you've cut from the common file to the new one,</span>
<span class="c"># but not for resources you've added (aws_eks_node_group.MY-SUPER-NODE-GROUP)</span>
terraform state <span class="nb">mv</span> <span class="s1">'aws_eks_node_group.tfer--common'</span> <span class="s1">'aws_eks_node_group.common'</span>

<span class="c"># Check that everything looks good</span>
terraform plan
</code></pre></div></div>

<p>If everything looks good, deploy through CI: <code class="language-plaintext highlighter-rouge">terraform plan -&gt; terraform apply</code></p>

<p>After separating the node groups for the cluster1 into a separate module, you need to remove them from the main module. This is a two-step process:</p>

<h4 id="1-remove-the-extracted-cluster1-node-groups-from-the-state">1. Remove the Extracted cluster1 Node Groups from the State</h4>

<p>For the common file:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>terraform state <span class="nb">rm </span>aws_eks_node_group.tfer--common
terraform state <span class="nb">rm </span>aws_eks_node_group.tfer--common1
terraform state <span class="nb">rm </span>aws_eks_node_group.tfer--common2
<span class="c"># ...</span>
</code></pre></div></div>

<h4 id="2-delete-resource-code-from-tf-files">2. Delete Resource Code from .tf Files</h4>

<p>After removing resources from the state, you need to delete their definitions from the .tf files. In your case, this is the file <code class="language-plaintext highlighter-rouge">aws/eks/cluster1/eks_node_group.tf</code>.
You need to remove code blocks for all node groups related to the <code class="language-plaintext highlighter-rouge">cluster1</code> cluster that you’ve moved to a separate file.
Verify the changes:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>terraform plan
</code></pre></div></div>

<p>The plan should not show any changes for the removed resources since they’ve already been removed from the state.</p>

<h4 id="migrate-future-manuals">Migrate future manuals</h4>

<p>It may happen that even after migrating to terraform, you will continue to create resources manually. Which is undesirable behavior, but it’s still life 😊. In this case, you can use this logic to periodically scan your AWS account for resources, compare them with terraform, and then migrate them:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>
<span class="nv">aws_instances</span><span class="o">=</span><span class="si">$(</span>aws ec2 describe-instances <span class="nt">--query</span> <span class="s1">'Reservations[*].Instances[*].[InstanceId]'</span> <span class="nt">--output</span> text<span class="si">)</span>

<span class="c"># Getting a list of all instances in AWS</span>
<span class="nv">tf_instances</span><span class="o">=</span><span class="si">$(</span>terraform show <span class="nt">-json</span> | jq <span class="nt">-r</span> <span class="s1">'.values.root_module.resources[] | select(.type == "aws_instance") | .values.id'</span><span class="si">)</span>

<span class="c"># Compare and show the difference</span>
<span class="k">for </span>instance <span class="k">in</span> <span class="nv">$aws_instances</span><span class="p">;</span> <span class="k">do
    if</span> <span class="o">!</span> <span class="nb">echo</span> <span class="s2">"</span><span class="nv">$tf_instances</span><span class="s2">"</span> | <span class="nb">grep</span> <span class="nt">-q</span> <span class="s2">"</span><span class="nv">$instance</span><span class="s2">"</span><span class="p">;</span> <span class="k">then
        </span><span class="nb">echo</span> <span class="s2">"A new instance has been found: </span><span class="nv">$instance</span><span class="s2">"</span>
        <span class="c"># Automatic import can be added</span>
        <span class="c"># terraform import aws_instance.new_$instance $instance</span>
    <span class="k">fi
done</span>
</code></pre></div></div>]]></content><author><name>Evgenii Zhuravlev</name></author><category term="Terraform" /><category term="Terraform" /><category term="AWS" /><category term="Cloud" /><category term="S3" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Injecting secrets directly into Pods and Gitlab from Hashicorp Vault in EKS/K8s.</title><link href="https://matrix-spec.github.io/kubernetes/2024/10/15/deploy-hashicorp-vault-in-EKS.html" rel="alternate" type="text/html" title="Injecting secrets directly into Pods and Gitlab from Hashicorp Vault in EKS/K8s." /><published>2024-10-15T00:00:00+00:00</published><updated>2024-10-15T00:00:00+00:00</updated><id>https://matrix-spec.github.io/kubernetes/2024/10/15/deploy-hashicorp-vault-in-EKS</id><content type="html" xml:base="https://matrix-spec.github.io/kubernetes/2024/10/15/deploy-hashicorp-vault-in-EKS.html"><![CDATA[<p><img src="/assets/images/posts/vault-in-EKS/vault-loader-dark.gif" alt="banner" /></p>

<p>In this post, I’ll show you how to deploy Vault in EKS/K8s (there are some minor differences, but the workflow is very similar) and use DynamoDB as a backend, as well as how to inject secrets directly into a pod without using K8s Secrets (more details: <a href="https://developer.hashicorp.com/vault/docs/platform/k8s/injector">Vault Agent Injector</a>). And then I’ll tell you how to use it to inject secrets into the Gitlab pipeline.</p>

<p>So, the moment has come, you’ve decided on a secret storage solution and chosen Hashicorp Vault. This is a good choice (at the very least, it’s cheaper than AWS Secrets Manager 😊). The next step is to determine which backend to use for Hashicorp Vault.
Whichever backend you choose, keep in mind that it may or may not have HA properties (don’t confuse this with Vault’s own HA - a separate feature that allows it to create a cluster across multiple nodes).
Each backend has its strengths and weaknesses, such as: support from Hashicorp or the community, HA capability, cost, vendor lock-in, access timeout, backups, and so on. More details here: <a href="https://developer.hashicorp.com/vault/docs/configuration/storage/aerospike">Vault backend options</a>. There’s a lot of information that’s beyond the scope of this post, so let’s move on.</p>

<blockquote>
  <p>Note that DynamoDB has fairly low rate limits, so it won’t be suitable for everyone, but the deployment for a different backend will only differ by a few lines of configuration.</p>
</blockquote>

<h2 id="prepare-for-deploy-in-k8s">Prepare for deploy in K8s</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl create namespace hashicorp-vault-prod
helm repo add hashicorp https://helm.releases.hashicorp.com
helm search repo hashicorp/vault
</code></pre></div></div>

<p>Create an <code class="language-plaintext highlighter-rouge">override-values.yaml</code> file. This will include all our parameters for creating the Vault release - this is one of the most important steps:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># this values file is prepared for deployment with TLS disabled for internal traffic (within the K8s cluster)</span>
<span class="c"># I'll explain below how to enable TLS and what you need for that</span>

global:
  enabled: <span class="nb">true
  </span>tlsDisable: <span class="nb">true 
</span>injector:
  enabled: <span class="nb">true
  </span>metrics:
    enabled: <span class="nb">true
  </span>nodeSelector:
    nodegroup: hashicorp-vault-nodes
  port: 8080
  agentDefaults:
    cpuLimit: 500m
    cpuRequest: 250m
    memLimit: 128Mi
    memRequest: 64Mi
server:
  enabled: <span class="s1">'-'</span>
  standalone:
    enabled: <span class="nb">false
  </span>auditStorage:
    enabled: <span class="nb">true
    </span>accessMode: ReadWriteOnce
    mountPath: /vault/audit
    size: 10Gi
  dataStorage:
    enabled: <span class="nb">false
  </span>nodeSelector:
    nodegroup: hashicorp-vault-nodes
  extraEnvironmentVars:
    VAULT_CACERT: <span class="s2">""</span>
  <span class="c"># extraEnvironmentVars:</span>
  <span class="c">#   VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca</span>
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::<span class="nv">$ACCOUNT_ID</span>:role/hashicorp-vault-role <span class="c"># for EKS + IAM only</span>
    create: <span class="nb">true
  </span>ha:
    enabled: <span class="nb">true
    </span>replicas: 3
    config: |
      ui <span class="o">=</span> <span class="nb">true

      </span>listener <span class="s2">"tcp"</span> <span class="o">{</span>
        tls_disable <span class="o">=</span> 1
        address <span class="o">=</span> <span class="s2">"[::]:8200"</span>
        cluster_address <span class="o">=</span> <span class="s2">"[::]:8201"</span>
        <span class="c"># if tls is enabled</span>
        <span class="c"># tls_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"</span>
        <span class="c"># tls_key_file = "/vault/userconfig/vault-server-tls/vault.key"</span>
        <span class="c"># tls_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"</span>
      <span class="o">}</span>

      <span class="c"># For internal ssl vault &lt;-&gt; injector:</span>
      <span class="c"># listener "tcp" {</span>
      <span class="c">#   tls_disable = 0</span>
      <span class="c">#   address = "[::]:8202"</span>
      <span class="c">#   cluster_address = "[::]:8201"</span>
      <span class="c"># }</span>

      storage <span class="s2">"dynamodb"</span> <span class="o">{</span>
        ha_enabled <span class="o">=</span> <span class="s2">"true"</span>
        region <span class="o">=</span> <span class="s2">"</span><span class="nv">$REGION</span><span class="s2">"</span>
        table <span class="o">=</span> <span class="s2">"</span><span class="nv">$DYNAMODB_TABLE</span><span class="s2">"</span>
      <span class="o">}</span>

      seal <span class="s2">"awskms"</span> <span class="o">{</span>
        region     <span class="o">=</span> <span class="s2">"eu-west-1"</span>
        kms_key_id <span class="o">=</span> <span class="s2">"</span><span class="nv">$KMS_KEY_ID</span><span class="s2">"</span>
        <span class="c"># no need now: endpoint   = "https://vpce-xxxxxxxxxxxxxxx.kms.eu-west-1.vpce.amazonaws.com"</span>
      <span class="o">}</span>

      service_registration <span class="s2">"kubernetes"</span> <span class="o">{}</span>
    disruptionBudget:
      enabled: <span class="nb">true
      </span>maxUnavailable: null
  ingress:
    enabled: <span class="nb">true
    </span>activeService: <span class="nb">true
    </span>annotations:
      kubernetes.io/ingress.class: <span class="s2">"nginx"</span>
      cert-manager.io/cluster-issuer: <span class="s2">"letsencrypt"</span>
      nginx.ingress.kubernetes.io/rewrite-target: <span class="s2">"/"</span>  
      nginx.ingress.kubernetes.io/ssl-redirect: <span class="s2">"true"</span>
      nginx.ingress.kubernetes.io/proxy-body-size: <span class="s2">"100m"</span>
    ingressClassName: nginx
    labels: <span class="o">{}</span>
    pathType: Prefix
    tls:
      - hosts:
          - vault.example.com
        secretName: <span class="nv">$TLS_SECRET_NAME</span>
    hosts: 
      - host: vault.example.com
ui:
  enabled: <span class="nb">true
  </span>serviceType: <span class="s2">"ClusterIP"</span>
  externalPort: 8202
  targetPort: 8202
</code></pre></div></div>

<h2 id="setup-on-the-aws-side">Setup on the AWS side</h2>

<h3 id="create-dynamodb">Create DynamoDB</h3>

<p>You can do this with Terragrunt:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>locals <span class="o">{</span>
  environment <span class="o">=</span> <span class="s2">"production"</span>
<span class="o">}</span>

terraform <span class="o">{</span>
  <span class="nb">source</span> <span class="o">=</span> <span class="s2">"tfr:///terraform-aws-modules/dynamodb-table/aws?version=4.2.0"</span>
<span class="o">}</span>

remote_state <span class="o">{</span>
  backend <span class="o">=</span> <span class="s2">"s3"</span>
  generate <span class="o">=</span> <span class="o">{</span>
    path      <span class="o">=</span> <span class="s2">"backend.tf"</span>
    if_exists <span class="o">=</span> <span class="s2">"overwrite_terragrunt"</span>
  <span class="o">}</span>
  config <span class="o">=</span> <span class="o">{</span>
    bucket <span class="o">=</span> <span class="s2">"</span><span class="nv">$BUCKET_FOR_BACKEND</span><span class="s2">"</span>

    key <span class="o">=</span> <span class="s2">"</span><span class="k">${</span><span class="nv">local</span><span class="p">.environment</span><span class="k">}</span><span class="s2">/dynamodb/terraform.tfstate"</span>
    region         <span class="o">=</span> <span class="s2">"eu-west-1"</span>
    dynamodb_table <span class="o">=</span> <span class="s2">"terraform-locks"</span>
  <span class="o">}</span>
<span class="o">}</span>

inputs <span class="o">=</span> <span class="o">{</span>
  name          <span class="o">=</span> <span class="s2">"</span><span class="k">${</span><span class="nv">local</span><span class="p">.environment</span><span class="k">}</span><span class="s2">-vault-hashicorp-backend"</span>
  hash_key      <span class="o">=</span> <span class="s2">"Path"</span>
  billing_mode   <span class="o">=</span> <span class="s2">"PAY_PER_REQUEST"</span>

  attribute <span class="o">=</span> <span class="o">[</span>
    <span class="o">{</span>
      name <span class="o">=</span> <span class="s2">"Path"</span>
      <span class="nb">type</span> <span class="o">=</span> <span class="s2">"S"</span>
    <span class="o">}</span>
  <span class="o">]</span>

  tags <span class="o">=</span> <span class="o">{</span>
      ...
    <span class="o">}</span>
<span class="o">}</span>
</code></pre></div></div>

<h3 id="setup-aws-iams--kms">Setup AWS IAMs + KMS</h3>

<p>Next, we’ll configure:</p>

<ul>
  <li>KMS key for <a href="https://developer.hashicorp.com/vault/tutorials/auto-unseal">auto unseal vault</a>,</li>
  <li>IAM role trust policy,</li>
  <li>Permission for KMS key</li>
</ul>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="nb">export </span><span class="nv">CLUSTER_NAME</span><span class="o">=</span>...
<span class="nb">export </span><span class="nv">ACCOUNT_ID</span><span class="o">=</span><span class="si">$(</span>aws sts get-caller-identity <span class="nt">--query</span> Account <span class="nt">--output</span> text<span class="si">)</span>
<span class="nb">export </span><span class="nv">OIDC_ID</span><span class="o">=</span><span class="si">$(</span>aws eks describe-cluster <span class="nt">--name</span> <span class="nv">$CLUSTER_NAME</span> <span class="nt">--query</span> <span class="s2">"cluster.identity.oidc.issuer"</span> <span class="nt">--output</span> text | <span class="nb">sed</span> <span class="s1">'s|https://||'</span><span class="si">)</span>
<span class="nb">export </span><span class="nv">SERVICEACCOUNT_NAME</span><span class="o">=</span>hashicorp-vault
<span class="nb">export </span><span class="nv">KMS_KEY_ID</span><span class="o">=</span><span class="si">$(</span>aws kms create-key <span class="nt">--description</span> <span class="s2">"Hashicorp Vault Encryption Key"</span> <span class="nt">--region</span> eu-west-1 <span class="nt">--query</span> <span class="s2">"KeyMetadata.KeyId"</span> <span class="nt">--output</span> text<span class="si">)</span>


<span class="c"># Creating IAM role trust policy</span>
<span class="nb">cat</span> <span class="o">&lt;&lt;</span><span class="no">EOF</span><span class="sh"> | envsubst | aws iam create-role --role-name hashicorp-vault-role --assume-role-policy-document file://-
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::</span><span class="nv">$ACCOUNT_ID</span><span class="sh">:oidc-provider/</span><span class="nv">$OIDC_ID</span><span class="sh">"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "</span><span class="nv">$OIDC_ID</span><span class="sh">:sub": [
            "system:serviceaccount:</span><span class="nv">$SERVICEACCOUNT_NAME</span><span class="sh">-prod:</span><span class="nv">$SERVICEACCOUNT_NAME</span><span class="sh">-prod"
          ]
        }
      }
    }
  ]
}
</span><span class="no">EOF

</span><span class="nb">cat</span> <span class="o">&lt;&lt;</span><span class="no">EOF</span><span class="sh"> | envsubst | aws iam create-policy --policy-name VaultKMSDynamoDBPolicy --policy-document  file://-
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowKMS",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt",
                "kms:DescribeKey"
            ],
            "Resource": "arn:aws:kms:eu-west-1:</span><span class="nv">$ACCOUNT_ID</span><span class="sh">:key/</span><span class="nv">$KMS_KEY_ID</span><span class="sh">"
        },
        {
            "Sid": "AllowDynamoDB",
            "Effect": "Allow",
            "Action": [
                "dynamodb:DescribeLimits",
                "dynamodb:DescribeTimeToLive",
                "dynamodb:ListTagsOfResource",
                "dynamodb:DescribeReservedCapacityOfferings",
                "dynamodb:DescribeReservedCapacity",
                "dynamodb:ListTables",
                "dynamodb:BatchGetItem",
                "dynamodb:BatchWriteItem",
                "dynamodb:CreateTable",
                "dynamodb:DeleteItem",
                "dynamodb:GetItem",
                "dynamodb:GetRecords",
                "dynamodb:PutItem",
                "dynamodb:Query",
                "dynamodb:UpdateItem",
                "dynamodb:Scan",
                "dynamodb:DescribeTable"
            ],
            "Resource": [
                "arn:aws:dynamodb:eu-west-1:</span><span class="nv">$ACCOUNT_ID</span><span class="sh">:table/production-vault-hashicorp-backend"
            ]
        }
    ]
}
</span><span class="no">EOF

</span>aws iam attach-role-policy <span class="nt">--role-name</span> hashicorp-vault-role <span class="nt">--policy-arn</span> arn:aws:iam::<span class="nv">$ACCOUNT_ID</span>:policy/VaultKMSDynamoDBPolicy
</code></pre></div></div>

<h2 id="deploy-vault-in-cluster">Deploy Vault in cluster</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>helm <span class="nb">install </span>prod-hashicorp-vault hashicorp/vault <span class="nt">--namespace</span> prod-hashicorp-vault <span class="nt">--create-namespace</span> <span class="nt">-f</span> ./override-values.yaml

<span class="c"># and IRSA in EKS for access to DynamoDB:</span>
kubectl annotate serviceaccount prod-hashicorp-vault <span class="nt">-n</span> prod-hashicorp-vault <span class="se">\</span>
  eks.amazonaws.com/role-arn<span class="o">=</span>arn:aws:iam::<span class="nv">$ACCOUNT_ID</span>:role/hashicorp-vault-role
</code></pre></div></div>

<p>At this point, you should see a StatefulSet in your cluster with the number of pods you defined in the <code class="language-plaintext highlighter-rouge">override-values.yaml</code> manifest under <code class="language-plaintext highlighter-rouge">ha: replicas:</code> and also a ReplicaSet with the number of pods for the Injector, like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k get pods <span class="nt">-n</span> prod-hashicorp-vault
NAME                                                   READY   STATUS    RESTARTS   AGE
prod-hashicorp-vault-0                                 1/1     Running   0          2m
prod-hashicorp-vault-1                                 1/1     Running   0          2m
prod-hashicorp-vault-2                                 1/1     Running   0          2m
prod-hashicorp-vault-agent-injector-7c6c7f7fc4-2hkph   1/1     Running   0          2m
</code></pre></div></div>

<h2 id="init-vault">Init Vault</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl <span class="nb">exec</span> <span class="nt">-n</span> prod-hashicorp-vault <span class="nt">-it</span> prod-hashicorp-vault-0 <span class="nt">--</span> /bin/sh
vault status

<span class="c"># Make sure the output shows:</span>
<span class="c"># Initialized = false</span>
<span class="c"># Sealed = true</span>
<span class="c"># Storage Type = dynamodb</span>
<span class="c"># HA Enabled = true</span>
vault operator init

<span class="c"># Save all tokens from the initialization output in a secure place. Losing them will render the vault inoperable!</span>
<span class="c"># You can use AWS Secrets Manager for this by storing the entire output in a single secret.</span>

vault status
<span class="c"># Make sure the output shows:</span>
<span class="c"># Recovery Seal Type = shamir</span>
<span class="c"># Initialized = true</span>
<span class="c"># Sealed = false</span>
<span class="c"># Storage Type = dynamodb</span>
<span class="c"># HA Enabled = true</span>
<span class="c"># and that the network address of the Vault cluster is from your K8s</span>
</code></pre></div></div>

<p>At this point, you have a working Vault cluster and an injector agent for it.</p>

<h2 id="injection-secrets-into-pods">Injection secrets into Pods</h2>

<p>Let’s look at injecting secrets into Pods. For this, we’ll create a test secret in Vault. Don’t use <a href="https://developer.hashicorp.com/vault/docs/secrets/cubbyhole">cubbyhole</a>!</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl <span class="nb">exec</span> <span class="nt">-n</span> prod-hashicorp-vault <span class="nt">-it</span> prod-hashicorp-vault-0 <span class="nt">--</span> /bin/sh
vault login <span class="nv">$MAIN_TOKEN</span>
vault audit <span class="nb">enable

</span>vault kv put my-kv/my-secret <span class="nv">token</span><span class="o">=</span>my-token <span class="nv">password</span><span class="o">=</span>my-password
vault kv get my-kv/my-secret
</code></pre></div></div>

<p>Let’s create an integration with K8s</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault auth <span class="nb">enable </span>kubernetes
vault write auth/kubernetes/config <span class="se">\</span>
  <span class="nv">kubernetes_host</span><span class="o">=</span><span class="s2">"https://</span><span class="nv">$KUBERNETES_PORT_443_TCP_ADDR</span><span class="s2">:443"</span> <span class="se">\</span>
  <span class="nv">token_reviewer_jwt</span><span class="o">=</span><span class="s2">"</span><span class="si">$(</span><span class="nb">cat</span> /var/run/secrets/kubernetes.io/serviceaccount/token<span class="si">)</span><span class="s2">"</span> <span class="se">\</span>
  <span class="nv">kubernetes_ca_cert</span><span class="o">=</span>@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt <span class="se">\</span>
  <span class="nv">issuer</span><span class="o">=</span><span class="s2">"https://kubernetes.default.svc.cluster.local"</span>
</code></pre></div></div>

<p>Create a read policy for this secret (for Pod-consumer)</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">echo</span> <span class="s1">'path "my-kv/data/my-secret" { capabilities = ["read"] }'</span> <span class="o">&gt;</span> /tmp/policy.hcl
vault policy write devweb-policy /tmp/policy.hcl
<span class="nb">rm</span> /tmp/policy.hcl
</code></pre></div></div>

<p>Match the policy (for Pod-consumer) with the Vault role (for the container, it will match by JWT token + SA + manifest for SA)</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault write auth/kubernetes/role/devweb-app <span class="se">\</span>
  <span class="nv">bound_service_account_names</span><span class="o">=</span>internal-app <span class="se">\</span>
  <span class="nv">bound_service_account_namespaces</span><span class="o">=</span>default <span class="se">\</span>
  <span class="nv">policies</span><span class="o">=</span>devweb-policy <span class="se">\</span>
  <span class="nv">ttl</span><span class="o">=</span>24h
</code></pre></div></div>

<p>Create a Pod-consumer for the <code class="language-plaintext highlighter-rouge">my-kv/my-secret</code> secret and a ServiceAccount for it, through which it can get the secret from the injector</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>apiVersion: v1
kind: ServiceAccount
metadata:
  name: internal-app
  namespace: default
<span class="nt">---</span>

apiVersion: apps/v1
kind: Deployment
metadata:
  name: devwebapp
  namespace: default
  labels:
    app: devwebapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: devwebapp
  template:
    metadata:
      labels:
        app: devwebapp
      annotations:
        vault.hashicorp.com/agent-inject: <span class="s2">"true"</span>
        vault.hashicorp.com/ca-cert: <span class="s2">"/run/secrets/kubernetes.io/serviceaccount/ca.crt"</span>
        vault.hashicorp.com/role: <span class="s2">"devweb-app"</span>
        vault.hashicorp.com/agent-inject-secret-config: <span class="s2">"my-kv/data/my-secret"</span>
        vault.hashicorp.com/agent-inject-template-config: |
          <span class="nv">TOKEN</span><span class="o">=</span>
          <span class="nv">PASSWORD</span><span class="o">=</span>
    spec:
      serviceAccountName: internal-app
      containers:
        - name: test-container
          image: busybox
          <span class="nb">command</span>: <span class="o">[</span><span class="s2">"/bin/sh"</span>, <span class="s2">"-c"</span><span class="o">]</span>
          args:
            - |
              <span class="k">while </span><span class="nb">true</span><span class="p">;</span> <span class="k">do
                if</span> <span class="o">[</span> <span class="nt">-f</span> /vault/secrets/config <span class="o">]</span><span class="p">;</span> <span class="k">then
                  </span><span class="nb">source</span> /vault/secrets/config
                  <span class="nb">echo</span> <span class="s2">"Token: </span><span class="nv">$TOKEN</span><span class="s2">"</span>
                  <span class="nb">echo</span> <span class="s2">"Password: </span><span class="nv">$PASSWORD</span><span class="s2">"</span>
                <span class="k">fi
                </span><span class="nb">sleep </span>5
              <span class="k">done</span>
</code></pre></div></div>

<p>Done, you’re awesome!</p>

<p>In the pod’s log output, you should see <code class="language-plaintext highlighter-rouge">"Token: $TOKEN"</code> and <code class="language-plaintext highlighter-rouge">"Password: $PASSWORD"</code> with the secret values from Vault.</p>

<p>In this step, you should pay attention to the manifest section:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>      annotations:
        vault.hashicorp.com/agent-inject: <span class="s2">"true"</span>
        vault.hashicorp.com/ca-cert: <span class="s2">"/run/secrets/kubernetes.io/serviceaccount/ca.crt"</span>
        vault.hashicorp.com/role: <span class="s2">"devweb-app"</span>
        vault.hashicorp.com/agent-inject-secret-config: <span class="s2">"my-kv/data/my-secret"</span>
        vault.hashicorp.com/agent-inject-template-config: |
          <span class="nv">TOKEN</span><span class="o">=</span>
          <span class="nv">PASSWORD</span><span class="o">=</span>
</code></pre></div></div>

<p>These are instructions for the injector, which it uses to work with secrets. All possible annotations for the injector: <a href="https://developer.hashicorp.com/vault/docs/platform/k8s/injector/annotations">Vault Agent Injector annotations</a>, they are quite extensive and allow you to perform various tasks.</p>

<h2 id="injection-secrets-into-gitlab">Injection secrets into Gitlab</h2>

<p>Let’s look at injecting secrets into Gitlab. This approach will help you store secrets in Vault and use them in Gitlab CI pipelines.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl <span class="nb">exec</span> <span class="nt">-n</span> prod-hashicorp-vault <span class="nt">-it</span> prod-hashicorp-vault-0 <span class="nt">--</span> /bin/sh
vault login <span class="nv">$MAIN_TOKEN</span>

vault kv put my-kv/my-secret <span class="nv">token</span><span class="o">=</span>my-token <span class="nv">password</span><span class="o">=</span>my-password
vault kv get my-kv/my-secret
</code></pre></div></div>

<p>Create an integration with Gitlab</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>vault auth <span class="nb">enable</span> <span class="nt">-path</span> jwt_v2 jwt

vault write auth/jwt/role/gitlab-role <span class="se">\</span>
    <span class="nv">role_type</span><span class="o">=</span><span class="s2">"jwt"</span> <span class="se">\</span>
    <span class="nv">bound_audiences</span><span class="o">=</span><span class="s2">"https://mygitlab.example"</span> <span class="se">\</span>
    <span class="nv">user_claim</span><span class="o">=</span><span class="s2">"sub"</span> <span class="se">\</span>
    <span class="nv">policies</span><span class="o">=</span><span class="s2">"gitlab-policy"</span> <span class="se">\</span>
    <span class="nv">ttl</span><span class="o">=</span><span class="s2">"1h"</span>
</code></pre></div></div>

<p>Create a read policy for GitLab with access to read secrets</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="o">&lt;&lt;</span><span class="no">EOF</span><span class="sh"> &gt; /tmp/gitlab-policy.hcl
path "my-kv/data/*" {
  capabilities = ["read"]
}
path "my-kv/metadata/*" {
  capabilities = ["list", "read"]
}
</span><span class="no">EOF

</span>vault policy write gitlab-policy /tmp/gitlab-policy.hcl

vault policy <span class="nb">read </span>gitlab-policy
</code></pre></div></div>

<p>Create a CI pipeline in Gitlab to retrieve the secret</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stages:
  - vault

fetch_secret:
  variables:
    VAULT_AUTH_ROLE: <span class="s2">"gitlab-role"</span>
    VAULT_AUTH_PATH: <span class="s2">"jwt_v2"</span>
    VAULT_SERVER_URL: <span class="s2">"https://prod-hashicorp-vault.prod-hashicorp-vault.svc:8200"</span>
    <span class="c"># or if your runner not in K8s:</span>
    <span class="c"># VAULT_SERVER_URL: "https://mygitlab.example:8200"</span>
    <span class="c"># and if you use ssl transit</span>
    <span class="c"># VAULT_CACERT: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt</span>
  image: hashicorp/vault
  stage: vault
  script:
    - <span class="nb">export </span><span class="nv">VAULT_TOKEN</span><span class="o">=</span><span class="si">$(</span>vault write <span class="nt">-field</span><span class="o">=</span>token auth/jwt/login <span class="nv">role</span><span class="o">=</span><span class="s2">"gitlab"</span> <span class="nv">jwt</span><span class="o">=</span><span class="s2">"</span><span class="nv">$CI_JOB_JWT</span><span class="s2">"</span><span class="si">)</span>
    - vault kv get <span class="nt">-field</span><span class="o">=</span>username my-kv/my-secret
    - <span class="nv">TOKEN</span><span class="o">=</span><span class="si">$(</span>vault kv get <span class="nt">-field</span><span class="o">=</span>token my-kv/my-secret<span class="si">)</span>
    - <span class="nv">PASSWORD</span><span class="o">=</span><span class="si">$(</span>vault kv get <span class="nt">-field</span><span class="o">=</span>password my-kv/my-secret<span class="si">)</span>
    - <span class="nb">echo </span><span class="nv">TOKEN</span><span class="o">=</span><span class="nv">$TOKEN</span>
    - <span class="nb">echo </span><span class="nv">PASSWORD</span><span class="o">=</span><span class="nv">$PASSWORD</span>
    - <span class="nb">echo</span> <span class="s2">"</span><span class="nv">$TOKEN</span><span class="s2">"</span> <span class="o">&gt;&gt;</span> vault_secrets.env
    - <span class="nb">echo</span> <span class="s2">"</span><span class="nv">$PASSWORD</span><span class="s2">"</span> <span class="o">&gt;&gt;</span> vault_secrets.env
  artifacts:
    paths:
      - vault_secrets.env
    expire_in: 1 hour

deploy_app:
  stage: deploy
  image: alpine
  dependencies:
    - fetch_secret
  script:
    - <span class="nb">source </span>vault_secrets.env
    - <span class="nb">echo</span>  <span class="s2">"</span><span class="nv">$TOKEN</span><span class="s2">"</span>
    - <span class="nb">rm</span> <span class="nt">-f</span> vault_secrets.env
</code></pre></div></div>

<p>Done, you’re awesome!</p>

<h2 id="enable-ssl-everywhere-for-vault-transit">Enable SSL everywhere for Vault (transit)</h2>

<p>The standard approach is that your EKS/K8s cluster is a trusted site, and traffic within it can travel unencrypted, aggregating SSL only at the ingress/LB. But you may encounter a situation where you want to use SSL between the injector and Vault everywhere inside your cluster. The downsides of this solution are the cluster-signed certificate valid for 1 year (meaning it will need to be renewed) and the fact that this certificate will need to be “distributed” to applications that will use it (in this case, only the injector).</p>

<p>Below I’ll explain how to do this:</p>

<h3 id="generate-ssl-certificates">Generate SSL certificates</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/bash</span>

<span class="nv">NAMESPACE</span><span class="o">=</span><span class="s2">"prod-hashicorp-vault"</span>
<span class="nv">SECRET_NAME</span><span class="o">=</span><span class="s2">"vault-server-tls"</span>
<span class="nv">TMPDIR</span><span class="o">=</span><span class="s2">"."</span>
<span class="nv">SERVICE</span><span class="o">=</span><span class="s2">"prod-hashicorp-vault"</span>
<span class="nv">CSR_NAME</span><span class="o">=</span><span class="s2">"vault-csr"</span>

openssl genrsa <span class="nt">-out</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/vault.key 2048

<span class="nb">cat</span> <span class="o">&lt;&lt;</span><span class="no">EOF</span><span class="sh"> &gt; </span><span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span><span class="sh">/csr.conf
[req]
req_extensions = v3_req
distinguished_name = req_distinguished_name

[req_distinguished_name]

[v3_req]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names

[alt_names]
DNS.1 = </span><span class="k">${</span><span class="nv">SERVICE</span><span class="k">}</span><span class="sh">
DNS.2 = </span><span class="k">${</span><span class="nv">SERVICE</span><span class="k">}</span><span class="sh">.</span><span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span><span class="sh">
DNS.3 = </span><span class="k">${</span><span class="nv">SERVICE</span><span class="k">}</span><span class="sh">.</span><span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span><span class="sh">.svc
DNS.4 = </span><span class="k">${</span><span class="nv">SERVICE</span><span class="k">}</span><span class="sh">.</span><span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span><span class="sh">.svc.cluster.local
IP.1 = 127.0.0.1
</span><span class="no">EOF

</span>openssl req <span class="nt">-new</span> <span class="nt">-key</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/vault.key <span class="nt">-subj</span> <span class="s2">"/CN=</span><span class="k">${</span><span class="nv">SERVICE</span><span class="k">}</span><span class="s2">.</span><span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span><span class="s2">.svc"</span> <span class="nt">-out</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/server.csr <span class="nt">-config</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/csr.conf

<span class="nb">cat</span> <span class="o">&lt;&lt;</span><span class="no">EOF</span><span class="sh"> &gt; </span><span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span><span class="sh">/csr.yaml
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  name: </span><span class="k">${</span><span class="nv">CSR_NAME</span><span class="k">}</span><span class="sh">
spec:
  signerName: beta.eks.amazonaws.com/app-serving
  request: </span><span class="si">$(</span><span class="nb">cat</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/server.csr | <span class="nb">base64</span> | <span class="nb">tr</span> <span class="nt">-d</span> <span class="s1">'\n'</span><span class="si">)</span><span class="sh">
  usages:
    - digital signature
    - key encipherment
    - server auth
  groups:
    - system:authenticated
</span><span class="no">EOF

</span>kubectl create <span class="nt">-f</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/csr.yaml

kubectl get csr
<span class="c"># CONDITION = Pending</span>

kubectl certificate approve <span class="k">${</span><span class="nv">CSR_NAME</span><span class="k">}</span>
kubectl get csr
<span class="c"># CONDITION =  Approved,Issued</span>

<span class="nv">serverCert</span><span class="o">=</span><span class="si">$(</span>kubectl get csr <span class="k">${</span><span class="nv">CSR_NAME</span><span class="k">}</span> <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s1">'{.status.certificate}'</span><span class="si">)</span>
<span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">serverCert</span><span class="k">}</span><span class="s2">"</span> | openssl <span class="nb">base64</span> <span class="nt">-d</span> <span class="nt">-A</span> <span class="nt">-out</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/vault.crt

kubectl config view <span class="nt">--raw</span> <span class="nt">--minify</span> <span class="nt">--flatten</span> <span class="nt">-o</span> <span class="nv">jsonpath</span><span class="o">=</span><span class="s1">'{.clusters[].cluster.certificate-authority-data}'</span> | <span class="nb">base64</span> <span class="nt">-d</span> <span class="o">&gt;</span> <span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/vault.ca

kubectl create secret generic <span class="k">${</span><span class="nv">SECRET_NAME</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--namespace</span> <span class="k">${</span><span class="nv">NAMESPACE</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--from-file</span><span class="o">=</span>vault.key<span class="o">=</span><span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/vault.key <span class="se">\</span>
    <span class="nt">--from-file</span><span class="o">=</span>vault.crt<span class="o">=</span><span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/vault.crt <span class="se">\</span>
    <span class="nt">--from-file</span><span class="o">=</span>vault.ca<span class="o">=</span><span class="k">${</span><span class="nv">TMPDIR</span><span class="k">}</span>/vault.ca
</code></pre></div></div>

<h3 id="override-valuesyaml-for-ssl-certificates">Override-values.yaml for SSL certificates</h3>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>global:
  enabled: <span class="nb">true
  </span>tlsDisable: <span class="nb">false 
</span>injector:
  enabled: <span class="nb">true
  </span>metrics:
    enabled: <span class="nb">true
  </span>nodeSelector:
    nodegroup: hashicorp-vault-nodes
  port: 8080
  agentDefaults:
    cpuLimit: 500m
    cpuRequest: 250m
    memLimit: 128Mi
    memRequest: 64Mi
server:
  enabled: <span class="s1">'-'</span>
  standalone:
    enabled: <span class="nb">false
  </span>auditStorage:
    enabled: <span class="nb">true
    </span>accessMode: ReadWriteOnce
    mountPath: /vault/audit
    size: 10Gi
  dataStorage:
    enabled: <span class="nb">false
  </span>nodeSelector:
    nodegroup: hashicorp-vault-nodes
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.ca
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::<span class="nv">$ACCOUNT_ID</span>:role/hashicorp-vault-role <span class="c"># for EKS + IAM only</span>
    create: <span class="nb">true
  </span>ha:
    enabled: <span class="nb">true
    </span>replicas: 3
    config: |
      ui <span class="o">=</span> <span class="nb">true

      </span>listener <span class="s2">"tcp"</span> <span class="o">{</span>
        tls_disable <span class="o">=</span> 0
        address <span class="o">=</span> <span class="s2">"[::]:8200"</span>
        cluster_address <span class="o">=</span> <span class="s2">"[::]:8201"</span>
        <span class="c"># if tls is enabled</span>
        tls_cert_file <span class="o">=</span> <span class="s2">"/vault/userconfig/vault-server-tls/vault.crt"</span>
        tls_key_file <span class="o">=</span> <span class="s2">"/vault/userconfig/vault-server-tls/vault.key"</span>
        tls_ca_cert_file <span class="o">=</span> <span class="s2">"/vault/userconfig/vault-server-tls/vault.ca"</span>
      <span class="o">}</span>

      <span class="c"># For internal ssl vault &lt;-&gt; injector:</span>
      listener <span class="s2">"tcp"</span> <span class="o">{</span>
        tls_disable <span class="o">=</span> 0
        address <span class="o">=</span> <span class="s2">"[::]:8202"</span>
        cluster_address <span class="o">=</span> <span class="s2">"[::]:8201"</span>
      <span class="o">}</span>

      storage <span class="s2">"dynamodb"</span> <span class="o">{</span>
        ha_enabled <span class="o">=</span> <span class="s2">"true"</span>
        region <span class="o">=</span> <span class="s2">"</span><span class="nv">$REGION</span><span class="s2">"</span>
        table <span class="o">=</span> <span class="s2">"</span><span class="nv">$DYNAMODB_TABLE</span><span class="s2">"</span>
      <span class="o">}</span>

      seal <span class="s2">"awskms"</span> <span class="o">{</span>
        region     <span class="o">=</span> <span class="s2">"eu-west-1"</span>
        kms_key_id <span class="o">=</span> <span class="s2">"</span><span class="nv">$KMS_KEY_ID</span><span class="s2">"</span>
        <span class="c"># no need now: endpoint   = "https://vpce-xxxxxxxxxxxxxxx.kms.eu-west-1.vpce.amazonaws.com"</span>
      <span class="o">}</span>

      service_registration <span class="s2">"kubernetes"</span> <span class="o">{}</span>
    disruptionBudget:
      enabled: <span class="nb">true
      </span>maxUnavailable: null
  ingress:
    enabled: <span class="nb">true
    </span>activeService: <span class="nb">true
    </span>annotations:
      kubernetes.io/ingress.class: <span class="s2">"nginx"</span>
      cert-manager.io/cluster-issuer: <span class="s2">"letsencrypt"</span>
      nginx.ingress.kubernetes.io/rewrite-target: <span class="s2">"/"</span>  
      nginx.ingress.kubernetes.io/ssl-redirect: <span class="s2">"true"</span>
      nginx.ingress.kubernetes.io/proxy-body-size: <span class="s2">"100m"</span>
    ingressClassName: nginx
    labels: <span class="o">{}</span>
    pathType: Prefix
    tls:
      - hosts:
          - vault.example.com
        secretName: <span class="nv">$TLS_SECRET_NAME</span>
    hosts: 
      - host: vault.example.com
ui:
  enabled: <span class="nb">true
  </span>serviceType: <span class="s2">"ClusterIP"</span>
  externalPort: 8202
  targetPort: 8202
</code></pre></div></div>]]></content><author><name>Evgenii Zhuravlev</name></author><category term="Kubernetes" /><category term="Kubernetes" /><category term="Vault" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Recovery DB in Zalando postgres operator in Kubernetes from S3</title><link href="https://matrix-spec.github.io/kubernetes/2023/03/05/recovery-DBs-in-Zalando-postgres-operator.html" rel="alternate" type="text/html" title="Recovery DB in Zalando postgres operator in Kubernetes from S3" /><published>2023-03-05T00:00:00+00:00</published><updated>2023-03-05T00:00:00+00:00</updated><id>https://matrix-spec.github.io/kubernetes/2023/03/05/recovery-DBs-in-Zalando-postgres-operator</id><content type="html" xml:base="https://matrix-spec.github.io/kubernetes/2023/03/05/recovery-DBs-in-Zalando-postgres-operator.html"><![CDATA[<p><img src="/assets/images/posts/zalando-postgres-operator-restore-DBs/zalando-posgres-operator-banner.webp" alt="banner" /></p>

<p>While working with the Zalando Postgres Operator in Kubernetes, I encountered a significant challenge: there is no well-documented, out-of-the-box method for restoring a database from an S3 backup. The operator itself is a great tool that simplifies PostgreSQL deployment and management in Kubernetes, but when it comes to recovery, the process is not as straightforward as one might expect.</p>

<p>This guide is the result of my research, hands-on experience, and an issue I raised on GitHub regarding database recovery in Zalando’s Postgres Operator (<a href="https://github.com/zalando/postgres-operator/issues/1395">issue #1395</a>). Here, I document a working solution to recover a PostgreSQL cluster from S3, outlining the necessary steps and configurations.</p>

<p>If you’re facing a similar issue, this article should help you navigate the recovery process efficiently. Let’s dive in. 🚀</p>

<h4 id="do-not-touch-bucket-with-wal-old-cluster">Do not touch bucket with WAL old cluster</h4>

<p>Whatch in configmap current directory for WAL:</p>

<p>Here and further: alias <strong>k = kubectl</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k get <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> configmap <span class="nt">-o</span> yaml
<span class="nt">---</span>
apiVersion: v1
items:
- apiVersion: v1
  data:
    ...
    WALG_S3_PREFIX: s3://bucket/wal_for_OLD &lt;<span class="o">=============</span> current directory <span class="k">for </span>WAL <span class="o">!!!</span> Do not <span class="nb">touch </span>it <span class="k">in </span>storage <span class="o">!!!</span>
    ...
</code></pre></div></div>

<h4 id="create-new-directory-for-wal-in-storage">Create new directory for WAL in storage</h4>
<p>In S3 provider for you account in buscket create <strong>new</strong> directory for <strong>new</strong> WAL.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>You need create directory **WAL_NEW_CLUSTER**
</code></pre></div></div>

<p><img src="/assets/images/posts/zalando-postgres-operator-restore-DBs/zalando-posgres-operator-sheme.png" alt="image" /></p>

<h4 id="editing-configmap">Editing configmap</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> ~/git/deploy-zalando-operator/
git pull
vim pg-pod-configmap.yaml
</code></pre></div></div>

<p>add / change:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>WALG_S3_PREFIX: s3://bucket/wal_for_OLD &lt;<span class="o">===================</span> it was, need to comment out
<span class="nt">---</span>
WALG_S3_PREFIX: s3://bucket/wal_for_NEW &lt;<span class="o">============</span> became
CLONE_USE_WALG_RESTORE: <span class="s2">"true"</span> &lt;<span class="o">============</span> became
</code></pre></div></div>

<h4 id="apply-configmap-in-k8s">Apply configmap in K8s</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k apply <span class="nt">-n</span> <span class="nv">$NAMESPACE</span>  <span class="nt">-f</span> ~/git/deploy-zalando-operator/pg-pod-configmap.yaml
</code></pre></div></div>

<h4 id="editing-manifest-cluster">Editing manifest cluster</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> ~/git/deploy-zalando-operator/
git pull
vim zalando-cluster.yaml
</code></pre></div></div>

<p>add section:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  clone:
    cluster: old-cluster-name
    s3_access_key_id: access_key_id
    s3_endpoint: endpoint
    s3_secret_access_key: secret_access_key
    s3_wal_path: s3://bucket/wal_for_OLD <span class="c"># bucket OLD cluster, from which we will recover.</span>
    timestamp: <span class="s2">"2021-01-21T23:49:03+03:00"</span> <span class="c"># timezone required (offset relative to UTC, see RFC 3339 section 5.6)</span>
</code></pre></div></div>

<h4 id="check-and-apply-cluster-manifest-in-k8s">Check and apply cluster manifest in K8s</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k apply <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> <span class="nt">-f</span> ~/git/deploy-zalando-operator/zalando-cluster.yaml <span class="nt">--server-dry-run</span>
k apply <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> <span class="nt">-f</span> ~/git/deploy-zalando-operator/zalando-cluster.yaml
</code></pre></div></div>

<h4 id="log-pattern">Log pattern</h4>
<p>Cluster can take a long time to up (up to 10 minutes on the test, it may depend on the size of the base):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k logs <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> pg-pod-<span class="k">***</span> <span class="nt">-f</span> <span class="o">(</span>only leader<span class="o">)</span>
...
021-03-05 01:30:33,855 - bootstrapping - INFO - Writing to file /run/etc/wal-e.d/env-clone-old-cluster-name/TMPDIR
2021-03-05 01:30:33,855 - bootstrapping - INFO - Configuring standby-cluster
2021-03-05 01:30:33,855 - bootstrapping - INFO - Configuring patroni
2021-03-05 01:30:33,871 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2021-03-05 01:30:33,871 - bootstrapping - INFO - Configuring crontab
2021-03-05 01:30:33,871 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of SYS_NICE capability
2021-03-05 01:30:33,881 - bootstrapping - INFO - Configuring certificate
2021-03-05 01:30:33,881 - bootstrapping - INFO - Generating ssl certificate
2021-03-05 01:30:33,937 - bootstrapping - INFO - Configuring pam-oauth2
2021-03-05 01:30:33,937 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2021-03-05 01:30:33,937 - bootstrapping - INFO - Configuring pgqd
2021-03-05 01:30:33,938 - bootstrapping - INFO - Configuring log
2021-03-05 01:30:35,355 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-03-05 01:30:35,363 INFO: Lock owner: None<span class="p">;</span> I am pg-pod-0
2021-03-05 01:30:35,531 INFO: trying to bootstrap a new cluster
2021-03-05 01:30:35,532 INFO: Running custom bootstrap script: envdir <span class="s2">"/run/etc/wal-e.d/env-clone-old-cluster-name"</span> python3 /scripts/clone_with_wale.py <span class="nt">--recovery-target-time</span><span class="o">=</span><span class="s2">"2021-03-03T23:49:03+03:00"</span>
2021-03-05 01:30:35,746 INFO: cloning cluster old-cluster-name using wal-g backup-fetch /home/postgres/pgdata/pgroot/data base_000000010000000000000006
INFO: 2021/03/05 01:30:35.943395 Finished decompression of part_004.tar.br
INFO: 2021/03/05 01:30:35.943416 Finished extraction of part_004.tar.br
INFO: 2021/03/05 01:30:37.269663 Finished extraction of part_001.tar.br
INFO: 2021/03/05 01:30:37.269967 Finished decompression of part_001.tar.br
INFO: 2021/03/05 01:30:37.481460 Finished decompression of part_002.tar.br
INFO: 2021/03/05 01:30:37.481464 Finished extraction of part_002.tar.br
INFO: 2021/03/05 01:30:37.491029 Finished decompression of pg_control.tar.br
INFO: 2021/03/05 01:30:37.491047 Finished extraction of pg_control.tar.br
INFO: 2021/03/05 01:30:37.491056 
Backup extraction complete.
2021-03-05 01:30:37,724 maybe_pg_upgrade INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-03-05 01:30:38 UTC <span class="o">[</span>129]: <span class="o">[</span>1-1] 604189be.81 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2021-03-05 01:30:38 UTC <span class="o">[</span>129]: <span class="o">[</span>2-1] 604189be.81 0     LOG:  pg_stat_kcache.linux_hz is <span class="nb">set </span>to 1000000
2021-03-05 01:30:38 UTC <span class="o">[</span>129]: <span class="o">[</span>3-1] 604189be.81 0     LOG:  listening on IPv4 address <span class="s2">"0.0.0.0"</span>, port 5432
2021-03-05 01:30:38 UTC <span class="o">[</span>129]: <span class="o">[</span>4-1] 604189be.81 0     LOG:  could not create IPv6 socket <span class="k">for </span>address <span class="s2">"::"</span>: Address family not supported by protocol
2021-03-05 01:30:38 UTC <span class="o">[</span>129]: <span class="o">[</span>5-1] 604189be.81 0     LOG:  listening on Unix socket <span class="s2">"/var/run/postgresql/.s.PGSQL.5432"</span>
2021-03-05 01:30:38,105 INFO: postmaster <span class="nv">pid</span><span class="o">=</span>129
2021-03-05 01:30:38 UTC <span class="o">[</span>129]: <span class="o">[</span>6-1] 604189be.81 0     LOG:  redirecting log output to logging collector process
2021-03-05 01:30:38 UTC <span class="o">[</span>129]: <span class="o">[</span>7-1] 604189be.81 0     HINT:  Future log output will appear <span class="k">in </span>directory <span class="s2">"../pg_log"</span><span class="nb">.</span>
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - rejecting connections
/var/run/postgresql:5432 - accepting connections
2021-03-05 01:30:40,206 INFO: establishing a new patroni connection to the postgres cluster
2021-03-05 01:30:40,285 INFO: waiting <span class="k">for </span>end of recovery after bootstrap
2021-03-05 01:30:50,277 INFO: waiting <span class="k">for </span>end of recovery after bootstrap
2021-03-05 01:31:00,270 INFO: waiting <span class="k">for </span>end of recovery after bootstrap
2021-03-05 01:31:10,275 INFO: waiting <span class="k">for </span>end of recovery after bootstrap
2021-03-05 01:31:20,285 INFO: waiting <span class="k">for </span>end of recovery after bootstrap
2021-03-05 01:31:30,270 INFO: waiting <span class="k">for </span>end of recovery after bootstrap
2021-03-05 01:31:40,275 INFO: waiting <span class="k">for </span>end of recovery after bootstrap
...
SET
DO
DO
DO
...
ALTER EXTENSION
ALTER POLICY
REVOKE
GRANT
...
2021-03-05 01:34:34.175 - /scripts/postgres_backup.sh - I was called as: /scripts/postgres_backup.sh /home/postgres/pgdata/pgroot/data
2021-03-05 01:34:34.447 - /scripts/postgres_backup.sh - producing a new backup
INFO: 2021/03/05 01:34:34.596538 Couldn<span class="s1">'t find previous backup. Doing full backup.
INFO: 2021/03/05 01:34:34.617491 Calling pg_start_backup()
2021-03-05 01:34:35.133 35 LOG Starting pgqd 3.3
2021-03-05 01:34:35.133 35 LOG auto-detecting dbs ...
INFO: 2021/03/05 01:34:35.154913 Walking ...
INFO: 2021/03/05 01:34:35.155188 Starting part 1 ...
INFO: 2021/03/05 01:34:35.155307 Starting part 2 ...
2021-03-05 01:34:42,276 INFO: Lock owner: pg-pod-0; I am pg-pod-0
2021-03-05 01:34:42,408 INFO: no action.  i am the leader with the lock
INFO: 2021/03/05 01:34:51.222156 Finished writing part 2.
INFO: 2021/03/05 01:34:51.222175 Starting part 3 ...
2021-03-05 01:34:52,276 INFO: Lock owner: pg-pod-0; I am pg-pod-0
2021-03-05 01:34:52,382 INFO: no action.  i am the leader with the lock
2021-03-05 01:35:02,277 INFO: Lock owner: pg-pod-0; I am pg-pod-0
2021-03-05 01:35:02,388 INFO: no action.  i am the leader with the lock
2021-03-05 01:35:05.162 35 LOG {ticks: 0, maint: 0, retry: 0}
INFO: 2021/03/05 01:35:10.928677 Finished writing part 3.
INFO: 2021/03/05 01:35:10.928696 Starting part 4 ...
2021-03-05 01:35:12,277 INFO: Lock owner: pg-pod-0; I am pg-pod-0
2021-03-05 01:35:12,395 INFO: no action.  i am the leader with the lock
</span></code></pre></div></div>

<h4 id="replace-credentials-for-bd-in-k8s">Replace credentials for BD in K8S</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl get secret root.old-cluster-name.credentials.postgresql.acid.zalan.do <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> <span class="nt">--export</span> <span class="nt">-o</span> yaml | kubectl replace <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> <span class="nt">-f</span> -
</code></pre></div></div>

<h4 id="delete-section-clone-in-cluster-manifest-in-k8s">Delete section “clone” in cluster manifest in K8s</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k edit <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> postgresqls.acid.zalan.do
<span class="nt">---</span>
<span class="c">#  clone: &lt;================================== comment out this section</span>
<span class="c">#    cluster: old-cluster-name</span>
<span class="c">#    s3_access_key_id: access_key_id</span>
<span class="c">#    s3_endpoint: endpoint</span>
<span class="c">#    s3_secret_access_key: secret_access_key</span>
<span class="c">#    s3_wal_path: s3://bucket/wal_for_OLD # bucket OLD cluster, from which we will recover.</span>
<span class="c">#    timestamp: "2021-01-21T23:49:03+03:00" # timezone required (offset relative to UTC, see RFC 3339 section 5.6)</span>
</code></pre></div></div>

<h4 id="restart-all-pods-in-namespace-with-apps">“Restart” all pods in namespace with apps</h4>
<p>This is necessary in order for the pods to re-read the secrets of the base, without deleting the pods at the moment this mechanism does not work in Kubernetes, understand what you are doing.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl delete <span class="nt">--all</span> pods <span class="nt">--namespace</span><span class="o">=</span><span class="nv">$NAMESPACE</span>
</code></pre></div></div>

<h4 id="done">Done</h4>

<p>Plan to recovery from SQL backup. This variant recovery is possible only in a newly deployed cluster (Variant for whisout redeploy cluster is below). Do not use for recovery in existing cluster! You need SQL backup for recovery DB:</p>

<h4 id="important-briefly">Important briefly</h4>

<p><strong>uid cluster for operator</strong>  = <strong>K8s uid</strong> from manifest postgres operator, you can find this field in the metadata of the source cluster:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kg <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> postgresqls.acid.zalan.do <span class="nt">-o</span> yaml
<span class="nt">---</span>
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: acid-test-cluster
  uid: efd12e58-5786-11e8-b5a7-06148230260c &lt;<span class="o">=====================</span> 
</code></pre></div></div>

<h1 id="more-details">More details</h1>

<p><strong>“initialize” for patroni</strong> = <strong>“initialize” field</strong> from manifest endpoint, you can find this field in the annotations of the endpoint:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kg <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> ep <span class="nv">$CLUSTER_NAME</span><span class="nt">-config</span> <span class="nt">-o</span> yaml     
<span class="nt">---</span>
apiVersion: v1
kind: Endpoints
metadata:
  annotations:
    config: ...
    initialize: <span class="s2">"6935500285706907727"</span> &lt;<span class="o">=====================</span> patroni cluster <span class="nb">id</span>
</code></pre></div></div>

<p>Show status cluster:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kg <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> postgresql.acid.zalan.do/<span class="nv">$CLUSTER_NAME</span> <span class="nt">-o</span> yaml
</code></pre></div></div>

<p>We are interested in the section:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>annotations:
   kubectl.kubernetes.io/last-applied-configuration: |
      ...
</code></pre></div></div>
<p>In this section we see current status cluster.</p>

<h4 id="documentation-links-that-can-help">Documentation (links) that can help</h4>

<ul>
  <li><a href="https://postgres-operator.readthedocs.io/en/latest/reference/cluster_manifest/">Cluster Manifest</a></li>
  <li><a href="https://github.com/zalando/postgres-operator">Zalando Postgres Operator</a></li>
  <li><a href="https://patroni.readthedocs.io/en/latest/">Patroni Docs</a></li>
  <li><a href="https://patroni.readthedocs.io/en/latest/kubernetes.html">Patroni Kubernetes</a></li>
  <li><a href="https://postgres-operator.readthedocs.io/en/latest/user/#clone-from-s3">Cloning from S3</a></li>
  <li><a href="https://github.com/zalando/postgres-operator/issues/1279#issuecomment-783574620">GitHub Issue 1279</a></li>
  <li><a href="https://github.com/zalando/postgres-operator/issues/1391">GitHub Issue 1391</a></li>
</ul>

<h4 id="️-dangerous-zone-️">⚠️ Dangerous Zone ⚠️</h4>
<blockquote>
  <p>Be very confident in what you are doing, ask senior DevOps. Any responsibility for using these commands rests with you. See Denial of responsibility</p>
</blockquote>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k delete <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> postgresqls.acid.zalan.do <span class="nt">--all</span> <span class="c"># delete all cluster PG</span>
<span class="nt">---</span>
k delete <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> pod <span class="nv">$POD_NAME</span> <span class="c"># delete pod</span>
<span class="nt">---</span>
k delete <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> configmap <span class="nv">$CM_NAME</span> <span class="c"># delete configmap</span>
<span class="nt">---</span>
k delete <span class="nt">-n</span> <span class="nv">$NAMESPACE</span> replicaset <span class="nv">$CM_NAME</span> <span class="c"># delete replicaset. May come in handy if the pods cluster get stuck on loop in Terminate &lt;-&gt; Init</span>
</code></pre></div></div>]]></content><author><name>Evgenii Zhuravlev</name></author><category term="Kubernetes" /><category term="Kubernetes" /><category term="PostgreSQL" /><category term="Helm" /><category term="S3" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Flexible CI/CD pipelines in GitLab. Version control and deployment in different environments using tags.</title><link href="https://matrix-spec.github.io/gitlab/2018/03/10/creating-simple-modular-CI-CD-for-developers.html" rel="alternate" type="text/html" title="Flexible CI/CD pipelines in GitLab. Version control and deployment in different environments using tags." /><published>2018-03-10T00:00:00+00:00</published><updated>2018-03-10T00:00:00+00:00</updated><id>https://matrix-spec.github.io/gitlab/2018/03/10/creating-simple-modular-CI-CD-for-developers</id><content type="html" xml:base="https://matrix-spec.github.io/gitlab/2018/03/10/creating-simple-modular-CI-CD-for-developers.html"><![CDATA[<p><img src="/assets/images/posts/simple-ci-cd/ci-cd.gif" alt="banner" /></p>

<p>The perfect pipeline is unattainable, that’s true. The fact is that you can use many steps for your pipeline, adding them as needed, it can include:</p>

<ul>
  <li>Code security scanning</li>
  <li>Code Review / Approved</li>
  <li>Linters</li>
  <li>Code Coverage</li>
  <li>Unit tests</li>
  <li>Builds</li>
  <li>Scan packages</li>
  <li>Deploy</li>
  <li>Integration testing</li>
  <li>Performance testing (load/stress testing)</li>
</ul>

<p>Two main questions arise here.</p>

<p>1) What steps should be included in my pipeline? There is no universal answer to this question because these steps are created individually based on the needs of your workflow.
2) How do I manage my pipeline so that it can deploy different versions of applications to different environments? To answer this question, we can use the Gitlab Tags approach, which will be clear to you after reading this note.</p>

<p>Let’s look at the simplest initial modular pipeline using Gitlab as an example to understand the principle, which you can further develop by adding the steps you need.
We’ll add three steps that developers can connect to CI/CD pipelines. These will be linter + build + deploy.
The strength of this approach is that they will be able to replace existing pieces of code in their pipelines with these includes, getting a unified solution.</p>

<p>Let’s assume that development currently has a repository <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop</code>, which has created such a file <code class="language-plaintext highlighter-rouge">.gitlab-ci.yml</code>:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>variables:
    ...

stages:
  - build-dev
  - build-prod
  - deploy

build-dev-job:
  interruptible: <span class="nb">true
  </span>stage: build-dev
  allow_failure: <span class="nb">false
  </span>image: docker:stable
  script:
    - docker login ...
    - docker build <span class="k">${</span><span class="nv">tag</span><span class="p">-dev</span><span class="k">}</span> ...
    - docker push <span class="k">${</span><span class="nv">tag</span><span class="p">-dev</span><span class="k">}</span> ...
  only:
    refs:
      - dev

build-prod-job:
  stage: build-prod
  allow_failure: <span class="nb">false
  </span>image: docker:stable
  script:
    - docker login ...
    - docker build <span class="k">${</span><span class="nv">tag</span><span class="p">-prod</span><span class="k">}</span> ...
    - docker push <span class="k">${</span><span class="nv">tag</span><span class="p">-prod</span><span class="k">}</span> ...
  only:
    refs:
      - prod

deploy-job:
  stage: deploy
  script:
    - helm upgrade <span class="nt">-i</span> ...
      <span class="nt">-f</span> ./values.yaml
      <span class="nt">--set</span> <span class="nv">tag</span><span class="o">=</span><span class="k">${</span><span class="nv">tag</span><span class="p">-prod</span><span class="k">}</span> ...
      <span class="nt">--create-namespace</span>
  only:
    refs:
      - prod
  when: manual
</code></pre></div></div>

<p>And in each project of each development team, it can be duplicated or slightly different, as in <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop/.gitlab-ci.yml</code>, and in <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop1/.gitlab-ci.yml</code> and so on. Our goal will be to create modular steps for them that they can connect to their pipelines without maintaining each of their <code class="language-plaintext highlighter-rouge">.gitlab-ci.yml</code> separately.
And also give them the ability to manage releases using Gitlab Tags, and deploy the same version of application code to different environments for testing, staging, and production to eliminate code drift.</p>

<h2 id="step-one-create-the-structure">Step One. Create the structure</h2>

<p>We will create a project gitlab.example.com/shared, in which we will create repositories:</p>

<p>gitlab.example.com/shared/linters
gitlab.example.com/shared/build
gitlab.example.com/shared/deploy</p>

<p>This structure will allow us to manage shared code and connect it to developer repositories in a modular way.</p>

<blockquote>
  <p>Don’t forget to add the gitlab.example.com/develop group with the Developer role in the gitlab.example.com/shared project in Group members (Manage -&gt; Members -&gt; Group members), otherwise when including shared code in the develop pipeline, it will fail with an access error on behalf of the person who launched it!</p>
</blockquote>

<h2 id="step-two-create-linters">Step Two. Create linters</h2>

<p>In the previously created project gitlab.example.com/shared, let’s create linters with directories:</p>

<p>gitlab.example.com/shared/linters/ansible
gitlab.example.com/shared/linters/docker
gitlab.example.com/shared/linters/terraform
gitlab.example.com/shared/linters/…</p>

<p>(This depends on what specifically needs to be linted in your case.)</p>

<p>The code in the files might look like this:</p>

<p><code class="language-plaintext highlighter-rouge">cat gitlab.example.com/shared/linters/ansible/.gitlab-ci.yml</code></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>variables:
  ALPINE_VERSION: 3.20.3
  ANSIBLE_LINT_VERSION_ALPINE_PACK: 24.5.0-r0

stages:
  - ansible_lint

ansible-lint-job:
  stage: ansible_lint
  image: alpine:<span class="k">${</span><span class="nv">ALPINE_VERSION</span><span class="k">}</span>
  script:
    - <span class="nb">echo</span> <span class="s2">"Installing Ansible Lint..."</span>
    - apk update
    - apk upgrade
    - apk add ansible-lint<span class="o">=</span><span class="k">${</span><span class="nv">ANSIBLE_LINT_VERSION_ALPINE_PACK</span><span class="k">}</span>
    - <span class="nb">echo</span> <span class="s2">"Running Ansible Lint..."</span>
    - |
      <span class="nv">FILES</span><span class="o">=</span><span class="si">$(</span>find <span class="nb">.</span> <span class="nt">-type</span> f <span class="se">\(</span> <span class="nt">-name</span> <span class="s1">'*.yml'</span> <span class="nt">-o</span> <span class="nt">-name</span> <span class="s1">'*.yaml'</span> <span class="se">\)</span> <span class="nt">-exec</span> <span class="nb">grep</span> <span class="nt">-El</span> <span class="s1">'(hosts:|tasks:|roles:)'</span> <span class="o">{}</span> +<span class="si">)</span>
      <span class="k">if</span> <span class="o">[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$FILES</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        for </span>file <span class="k">in</span> <span class="nv">$FILES</span><span class="p">;</span> <span class="k">do
          </span><span class="nb">echo</span> <span class="s2">"Linting </span><span class="nv">$file</span><span class="s2">..."</span>
          ansible-lint <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span> | <span class="nb">tee </span>ansible_lint.log
        <span class="k">done
      else
        </span><span class="nb">echo</span> <span class="s2">"No Ansible YAML files found, skipping linting."</span>
        <span class="nb">exit </span>1
      <span class="k">fi
  </span>rules:
    - when: always
  artifacts:
    when: always
    paths:
      - ansible_lint.log
    expire_in: 1 week
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">cat gitlab.example.com/shared/linters/docker/.gitlab-ci.yml</code></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>variables:
  ALPINE_VERSION: 3.20.3
  HADOLINT_VERSION: v2.12.0

stages:
  - docker_lint

docker-lint-job:
  stage: docker_lint
  image: alpine:<span class="k">${</span><span class="nv">ALPINE_VERSION</span><span class="k">}</span>
  rules:
    - <span class="k">if</span>: <span class="s1">'$CI_COMMIT_TAG &amp;&amp; $CI_COMMIT_REF_PROTECTED == "true"'</span>
      when: always
    - when: never
  script:
    - <span class="nb">echo</span> <span class="s2">"Installing Hadolint..."</span>
    - apk add <span class="nt">--no-cache</span> curl
    - curl <span class="nt">-sSL</span> https://github.com/hadolint/hadolint/releases/download/<span class="k">${</span><span class="nv">HADOLINT_VERSION</span><span class="k">}</span>/hadolint-Linux-x86_64 <span class="nt">-o</span> /usr/local/bin/hadolint
    - <span class="nb">chmod</span> +x /usr/local/bin/hadolint
    - <span class="nb">echo</span> <span class="s2">"Running Hadolint..."</span>
    - |
      <span class="nv">FILES</span><span class="o">=</span><span class="si">$(</span>find <span class="nb">.</span> <span class="nt">-type</span> f <span class="nt">-name</span> <span class="s1">'*Dockerfile*'</span><span class="si">)</span>
      <span class="k">if</span> <span class="o">[</span> <span class="nt">-n</span> <span class="s2">"</span><span class="nv">$FILES</span><span class="s2">"</span> <span class="o">]</span><span class="p">;</span> <span class="k">then
        for </span>file <span class="k">in</span> <span class="nv">$FILES</span><span class="p">;</span> <span class="k">do
          </span><span class="nb">echo</span> <span class="s2">"Linting </span><span class="nv">$file</span><span class="s2">..."</span>
          hadolint <span class="nt">--no-fail</span> <span class="s2">"</span><span class="nv">$file</span><span class="s2">"</span> | <span class="nb">tee </span>docker_lint.log
        <span class="k">done
      else
        </span><span class="nb">echo</span> <span class="s2">"No files containing 'Dockerfile' found, skipping linting."</span>
        <span class="nb">exit </span>1
      <span class="k">fi
  </span>artifacts:
    when: always
    paths:
      - docker_lint.log
    expire_in: 1 week
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">cat gitlab.example.com/shared/linters/terraform/.gitlab-ci.yml</code></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>variables:
  ALPINE_VERSION: 3.20.3
  TFLINT_VERSION: v0.48.0

stages:
  - terraform_lint

terraform-lint-job:
  stage: terraform_lint
  image: alpine:<span class="k">${</span><span class="nv">ALPINE_VERSION</span><span class="k">}</span>
  script:
    - <span class="nb">echo</span> <span class="s2">"Installing Terraform Linter..."</span>
    - apk update
    - apk upgrade
    - apk add <span class="nt">--no-cache</span> curl
    - curl <span class="nt">-L</span> https://github.com/terraform-linters/tflint/releases/download/<span class="k">${</span><span class="nv">TFLINT_VERSION</span><span class="k">}</span>/tflint_linux_amd64.zip <span class="nt">-o</span> tflint.zip
    - apk add <span class="nt">--no-cache</span> unzip
    - unzip tflint.zip <span class="nt">-d</span> /usr/local/bin
    - <span class="nb">chmod</span> +x /usr/local/bin/tflint
    - <span class="nb">rm </span>tflint.zip
    - <span class="nb">echo</span> <span class="s2">"Running Terraform Linter recursively..."</span>
    - tflint <span class="nt">--recursive</span> | <span class="nb">tee </span>tflint.log <span class="o">||</span> <span class="nb">exit </span>1
  artifacts:
    when: always
    paths:
      - tflint.log
    expire_in: 1 week
  rules:
    - when: always
</code></pre></div></div>

<h2 id="step-three-create-builders">Step Three. Create builders</h2>

<p>In the previously created project gitlab.example.com/shared, let’s create builders with directories:</p>

<p>gitlab.example.com/shared/build/service
gitlab.example.com/shared/build/service-feature
gitlab.example.com/shared/build/service-feature1
gitlab.example.com/shared/build/…</p>

<p>(This depends on what specifically needs to be built in your case.) What’s important to understand here?
The build can be universal for all services if they are built absolutely identically, then you will only need one file <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/build/service/.gitlab-ci.yml</code>, which you can connect to all services at once. But if there are differences in builds between services and they require different workflows, then you may need to support additional files in which you can take into account the specifics of building for certain services: <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/build/service-feature/.gitlab-ci.yml</code>, <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/build/service-feature1/.gitlab-ci.yml</code> and so on.</p>

<p>The code in the files might look like this:</p>

<p><code class="language-plaintext highlighter-rouge">cat gitlab.example.com/shared/build/service/.gitlab-ci.yml</code></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stages:
  - build

build-dev-job:
  stage: build
  allow_failure: <span class="nb">false
  </span>when: manual
  except:
    - tags
    - protected
  image:
    ...
  script: |
    docker build <span class="nt">-f</span> <span class="nv">$CI_PROJECT_DIR</span>/Dockerfile <span class="se">\</span>
      <span class="nt">--tag</span> <span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span>:<span class="k">${</span><span class="nv">CI_COMMIT_REF_NAME</span><span class="p">////_</span><span class="k">}</span>-<span class="k">${</span><span class="nv">CI_COMMIT_SHORT_SHA</span><span class="k">}</span>
    docker push <span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span>:<span class="k">${</span><span class="nv">CI_COMMIT_REF_NAME</span><span class="p">////_</span><span class="k">}</span>-<span class="k">${</span><span class="nv">CI_COMMIT_SHORT_SHA</span><span class="k">}</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
      </span><span class="nb">echo</span> <span class="s2">"The image was successfully build: </span><span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">CI_COMMIT_REF_NAME</span><span class="p">////_</span><span class="k">}</span><span class="s2">-</span><span class="k">${</span><span class="nv">CI_COMMIT_SHORT_SHA</span><span class="k">}</span><span class="s2">"</span>
    <span class="k">else
      </span><span class="nb">echo</span> <span class="s2">"Build problem!"</span>
      <span class="nb">exit </span>1
    <span class="k">fi

</span>build-prod-job:
  stage: build
  allow_failure: <span class="nb">false
  </span>rules:
    - <span class="k">if</span>: <span class="s1">'$CI_COMMIT_TAG &amp;&amp; $CI_COMMIT_REF_PROTECTED == "true"'</span>
      when: always
    - when: never
  image:
    ...
  script: |
    docker build <span class="nt">-f</span> <span class="nv">$CI_PROJECT_DIR</span>/Dockerfile <span class="se">\</span>
      <span class="nt">--tag</span> <span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span>:<span class="k">${</span><span class="nv">CI_COMMIT_TAG</span><span class="k">}</span>
    docker push <span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span>:<span class="k">${</span><span class="nv">CI_COMMIT_TAG</span><span class="k">}</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
      </span><span class="nb">echo</span> <span class="s2">"The image was successfully build: </span><span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">CI_COMMIT_TAG</span><span class="k">}</span><span class="s2">"</span>
    <span class="k">else
      </span><span class="nb">echo</span> <span class="s2">"Build problem!"</span>
      <span class="nb">exit </span>1
    <span class="k">fi</span>
</code></pre></div></div>

<p>It’s important to give some explanations here:</p>

<p>Instructions</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    - tags
    - protected
</code></pre></div></div>

<p>in both jobs are configured in such a way that when using a Gitlab tag on a stable branch - you will get an automatic start of the prod build. In all other cases, you can start the build manually, for example for testing in a separate environment.</p>

<h2 id="step-four-create-deployments">Step Four. Create deployments</h2>

<p>In the previously created project gitlab.example.com/shared, let’s create deployments with directories:</p>

<p>gitlab.example.com/shared/deploy/service
gitlab.example.com/shared/deploy/service-feature1
gitlab.example.com/shared/deploy/service-feature1
gitlab.example.com/shared/deploy/…</p>

<p>(This depends on what specifically needs to be deployed in your case.) What’s important to understand here?
As in the previous point, deployment can be universal for all services if they are deployed absolutely identically, then you will only need one file <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/deploy/service/.gitlab-ci.yml</code>, which you can connect to all services at once. But if there are differences in deployments between services and they require different workflows, then you may need to support additional files in which you can take into account the specifics of deployment for certain services: <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/deploy/service-feature1/.gitlab-ci.yml</code>, <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/deploy/service-feature2/.gitlab-ci.yml</code> and so on.</p>

<p>I want to add here that at this step in deployments, it is possible to divide them not only by the features of services, but also by environments, for example:</p>

<p>gitlab.example.com/shared/deploy/service-dev/.gitlab-ci.yml
gitlab.example.com/shared/deploy/service-stage/.gitlab-ci.yml
gitlab.example.com/shared/deploy/service-prod/.gitlab-ci.yml</p>

<p>The code in the files might look like this:</p>

<p><code class="language-plaintext highlighter-rouge">cat gitlab.example.com/shared/deploy/service-dev/.gitlab-ci.yml</code></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stages:
  - deploy-dev

deploy-dev-job:
  stage: deploy-dev
  needs:
    - build-dev-job
  except:
    - tags
    - protected
  image:
    ...
  script: |
    docker login ...
    helm upgrade <span class="nt">-i</span> <span class="s2">"</span><span class="k">${</span><span class="nv">CI_PROJECT_NAME</span><span class="k">}</span><span class="s2">-dev"</span> <span class="se">\</span>
      <span class="nt">-n</span> <span class="s2">"</span><span class="k">${</span><span class="nv">CI_PROJECT_NAME</span><span class="k">}</span><span class="s2">-dev"</span> <span class="se">\</span>
      <span class="nt">-f</span> values-dev.yaml <span class="se">\</span>
      <span class="nt">--create-namespace</span> <span class="se">\</span>
      <span class="nt">--set</span> images.repository<span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
      <span class="nt">--set</span> images.tag<span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CI_COMMIT_REF_NAME</span><span class="p">////_</span><span class="k">}</span><span class="s2">"</span>-<span class="s2">"</span><span class="k">${</span><span class="nv">CI_COMMIT_SHORT_SHA</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
      ... <span class="se">\</span>
      <span class="nt">--debug</span> <span class="se">\</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
      </span><span class="nb">echo</span> <span class="s2">"The image was successfully deployed in the dev stand (K8s): </span><span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">CI_COMMIT_REF_NAME</span><span class="p">////_</span><span class="k">}</span><span class="s2">-</span><span class="k">${</span><span class="nv">CI_COMMIT_SHORT_SHA</span><span class="k">}</span><span class="s2">"</span>
    <span class="k">else
      </span><span class="nb">echo</span> <span class="s2">"Deployment problem!"</span>
      <span class="nb">exit </span>1
    <span class="k">fi</span>
    
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">cat gitlab.example.com/shared/deploy/service-prod/.gitlab-ci.yml</code></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stages:
  - deploy-prod

deploy-prod-job:
  stage: deploy-prod
  allow_failure: <span class="nb">false
  </span>rules:
    - <span class="k">if</span>: <span class="s1">'$CI_COMMIT_TAG &amp;&amp; $CI_COMMIT_REF_PROTECTED == "true"'</span>
      when: manual
    - when: never
  needs:
    - build-prod-job
  image:
    ...
  script: |
    docker login ...
    helm upgrade <span class="nt">-i</span> <span class="s2">"</span><span class="k">${</span><span class="nv">CI_PROJECT_NAME</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
      <span class="nt">-n</span> <span class="s2">"</span><span class="k">${</span><span class="nv">CI_PROJECT_NAME</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
      <span class="nt">-f</span> values-prod.yaml <span class="se">\</span>
      <span class="nt">--create-namespace</span> <span class="se">\</span>
      <span class="nt">--set</span> images.repository<span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
      <span class="nt">--set</span> images.tag<span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">IMAGE_TAG</span><span class="k">}</span><span class="s2">"</span> <span class="se">\</span>
      ... <span class="se">\</span>
      <span class="nt">--debug</span> <span class="se">\</span>
    <span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> <span class="nt">-eq</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then
      </span><span class="nb">echo</span> <span class="s2">"The image was successfully deployed in the production stand (K8s): </span><span class="k">${</span><span class="nv">CI_REGISTRY_IMAGE</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">IMAGE_TAG</span><span class="k">}</span><span class="s2">"</span>
    <span class="k">else
      </span><span class="nb">echo</span> <span class="s2">"Deployment problem!"</span>
      <span class="nb">exit </span>1
    <span class="k">fi</span>
</code></pre></div></div>

<p>This is the most interesting step in my note. Note that the <code class="language-plaintext highlighter-rouge">needs:</code> instructions expect the corresponding steps from the previous step where we configured the build listing. Because they should not be executed if the build fails. They also expect the creation of a Gitlab tag on a stable branch - to manually launch the prod deployment. In all other cases, the deployment will occur in the K8s Namespace with the <code class="language-plaintext highlighter-rouge">-dev</code> postfix for testing. Flow control can occur in one file (for this, combine the listings of the files <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/deploy/service-dev/.gitlab-ci.yml</code> and <code class="language-plaintext highlighter-rouge">gitlab.example.com/shared/deploy/service-prod/.gitlab-ci.yml</code> into one). Here I use two files to clearly demonstrate the principle.
It doesn’t matter where we deploy at this step. Perhaps you won’t be using K8s at all, but a Linux server or transferring a package to artifacts, the principle will remain unchanged, just replace the deployment target.</p>

<h2 id="step-five-transfer-all-created-steps-to-the-pipeline">Step Five. Transfer all created steps to the pipeline</h2>

<p>At the beginning, we said that gitlab.example.com/develop/.gitlab-ci.yml is managed locally, now we can add our includes created above to it, and get such a listing:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>variables:
  IMAGE_TAG: <span class="k">${</span><span class="nv">CI_COMMIT_TAG</span><span class="k">}</span>

include:
  - project: <span class="s1">'shared/linters'</span>
    file: <span class="s1">'docker/.gitlab-ci.yml'</span>
    ref: <span class="s1">'main'</span>
  - project: <span class="s1">'shared/linters'</span>
    file: <span class="s1">'ansible/.gitlab-ci.yml'</span>
    ref: <span class="s1">'main'</span>
  - project: <span class="s1">'shared/build'</span>
    file: <span class="s1">'docker-ml/.gitlab-ci.yml'</span>
    ref: <span class="s1">'main'</span>
  - project: <span class="s1">'shared/deploy'</span>
    file: <span class="s1">'deploy-dev/.gitlab-ci.yml'</span>
    ref: <span class="s1">'main'</span>
  - project: <span class="s1">'shared/deploy'</span>
    file: <span class="s1">'deploy-prod/.gitlab-ci.yml'</span>
    ref: <span class="s1">'main'</span>

stages:
  - docker_lint
  - ansible_lint
  - build
  - deploy-dev
  - deploy-production

<span class="c"># The end =)</span>
</code></pre></div></div>

<p>It’s important in this step that the <code class="language-plaintext highlighter-rouge">stages:</code> instructions should refer to existing stages in the <code class="language-plaintext highlighter-rouge">shared/.../.gitlab-ci.yml</code> files.</p>

<p>There are two approaches to how to deliver code with such includes to the main pipeline. The <code class="language-plaintext highlighter-rouge">ref:</code> instruction is responsible for this:</p>

<p>1) <code class="language-plaintext highlighter-rouge">ref: 'main'</code> - all changes must get into main, after which all main pipelines will receive them when they start
2) <code class="language-plaintext highlighter-rouge">ref: 'feature-name'</code> - you can granularly deliver changes, going through all the main pipelines and moving to a new version.</p>

<p>Both approaches have strengths and weaknesses, for example, when using <code class="language-plaintext highlighter-rouge">ref: 'main'</code>, you can instantly deliver build changes for all <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop/.gitlab-ci.yml</code>, <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop1/.gitlab-ci.yml</code>, <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop2/.gitlab-ci.yml</code> and so on, and what if you have more than a hundred of them? This is convenient. But the risk of delivering an error to all repos at once increases.
When using <code class="language-plaintext highlighter-rouge">ref: 'feature-name'</code>, you can more strictly monitor errors and test, for example, only on <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop/.gitlab-ci.yml</code> without delivering code changes to the rest <code class="language-plaintext highlighter-rouge">gitlab.example.com/develop1/.gitlab-ci.yml</code>, … But this will be slower.</p>

<p>As always, the choice is yours =)</p>

<h2 id="documentation-links-that-can-help">Documentation (links) that can help</h2>

<ul>
  <li><a href="https://docs.gitlab.com/user/project/repository/tags/">Gitlab Tags</a></li>
  <li><a href="https://docs.gitlab.com/ci/yaml/">Gitlab CI/CD YAML syntax reference</a></li>
  <li><a href="https://docs.gitlab.com/ci/yaml/includes/">Gitlab How to Use CI/CD configuration from other files</a></li>
  <li><a href="https://docs.gitlab.com/ci/variables/predefined_variables/">Predefined CI/CD variables reference</a></li>
</ul>]]></content><author><name>Evgenii Zhuravlev</name></author><category term="Gitlab" /><category term="Gitlab" /><category term="CI" /><category term="CD" /><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">Certbot on two servers with Round-Robin DNS</title><link href="https://matrix-spec.github.io/linux/2018/01/01/certbot-on-two-servers-with-Round-Robin-DNS.html" rel="alternate" type="text/html" title="Certbot on two servers with Round-Robin DNS" /><published>2018-01-01T00:00:00+00:00</published><updated>2018-01-01T00:00:00+00:00</updated><id>https://matrix-spec.github.io/linux/2018/01/01/certbot-on-two-servers-with-Round-Robin-DNS</id><content type="html" xml:base="https://matrix-spec.github.io/linux/2018/01/01/certbot-on-two-servers-with-Round-Robin-DNS.html"><![CDATA[<p><img src="/assets/images/posts/certbot-Round-Robin-dns/scheme.png" alt="banner" /></p>

<p>You may need to run ACME HTTP-01 to verify the Certbot certificate. The nuance is that you cannot perform DNS-01, because, for example, the zone does not belong to you, you only serve the site, but at the same time this site is located on several servers. At the same time, his address is resolved by Roud-Robin DNS, for example, like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>nslookup example.com
1.1.1.1
2.2.2.2
</code></pre></div></div>

<p>This is a possible case, I’ve come across it myself a long time ago, but it definitely exists. I would also add that in this case it would be good to have a Load Balancer in front of the servers, but an attentive reader will say that this will not fix the situation without special configuration of the Load Balancer, let’s figure it out.</p>

<h2 id="recall-how-http-01-validation-works-in-acme-lets-encrypt-via-certbot">Recall how HTTP-01 validation works in ACME (Let’s Encrypt) via Certbot</h2>

<p>HTTP-01 is a domain ownership verification method in which Let’s Encrypt checks for a special file in the web root (.well-known/acme-challenge).</p>

<h3 id="certificate-request">Certificate Request</h3>

<p>Run Certbot with the command, for example:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>certbot certonly <span class="nt">--webroot</span> <span class="nt">-w</span> /var/www/html <span class="nt">-d</span> example.com
</code></pre></div></div>

<h3 id="certbot-creates-a-temporary-file-in-the-directory">Certbot creates a temporary file in the directory</h3>

<p>/var/www/html/.well-known/acme-challenge/<random_string>
The file contains a special string unique to your query.</random_string></p>

<h3 id="request-from-lets-encrypt">Request from Let’s Encrypt</h3>

<p>The Let’s Encrypt server accesses your site via HTTP:
http://example.com/.well-known/acme-challenge /<random_string>
and checks that the file is available and contains the correct data.</random_string></p>

<h3 id="certificate-issuance">Certificate issuance</h3>

<p>If everything was successful:</p>

<p>✅ Let’s Encrypt confirms the domain.</p>

<p>✅ The certificate is issued and stored in /etc/letsencrypt/live/example.com/</p>

<p>✅ If auto-tuning is used (certbot –nginx or certbot –apache), the configs are updated automatically.</p>

<h2 id="what-happens-if-there-are-several-servers">What happens if there are several servers?</h2>

<p>Certbot will create a temporary file on one server, and the Let’s Encrypt authentication server can use the Roud-Robin DNS to get the address of another, and when it performs verification, it will not find this file on it. After all, it wasn’t created there, that’s right! How can this be avoided?
Below is the solution that allows you to update/obtain the certificate correctly:</p>

<h3 id="cron-on-01-server">Cron on 01-SERVER</h3>

<p>Add on the 01-SERVER (ip = 1.1.1.1), the certbot update is performed on the first server and is NOT performed on the second (so that the key directories merge correctly):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> /etc/cron.d/certbot
0 <span class="k">*</span>/12 <span class="k">*</span> <span class="k">*</span> <span class="k">*</span> root python <span class="nt">-c</span> <span class="s1">'import random; import time; time.sleep(random.random() * 3600)'</span> <span class="o">&amp;&amp;</span> certbot <span class="nt">-q</span> renew <span class="nt">--allow-subset-of-names</span> <span class="nt">--renew-hook</span> <span class="s2">"systemctl reload nginx"</span>
</code></pre></div></div>

<p>and</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> /etc/letsencrypt/renewal/SITE.conf
....
<span class="o">[</span>renewalparams]

<span class="c">#custom:</span>
autorenew <span class="o">=</span> False
</code></pre></div></div>

<p>Trigger the changes (these are the rules recorded in incrontab):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>incrontab <span class="nt">-l</span>
/etc/letsencrypt/archive/www.SITE.com-0001/ IN_MODIFY,IN_CREATE,IN_DELETE,IN_CLOSE_WRITE /usr/local/bin/rsync_using_incron_for_certbot.sh
</code></pre></div></div>

<p>According to the changes, we overwrite the certificate and links to it on the second server and if everything is correct, restart nginx, if not correct, issue an alert to monitoring:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> /usr/local/bin/rsync_using_incron_for_certbot.sh

<span class="c">#!/bin/bash</span>

<span class="c"># VARs:</span>
<span class="nv">RESULT_READ_SYMLINC_FULLCHAIN</span><span class="o">=</span><span class="si">$(</span><span class="nb">readlink</span> <span class="nt">-f</span> /etc/nginx/ssl/SITE.ru-443-l2.crt | <span class="nb">grep</span> <span class="nt">-o</span> fullchain[[:alnum:]].pem<span class="si">)</span>
<span class="nv">RESULT_FOR_PUSH_IN_FULLCHAIN</span><span class="o">=(</span>/etc/letsencrypt/archive/www.SITE.ru/<span class="nv">$RESULT_READ_SYMLINC_FULLCHAIN</span><span class="o">)</span>
<span class="nv">RESULT_READ_SYMLINC_PRIVKEY</span><span class="o">=</span><span class="si">$(</span><span class="nb">readlink</span> <span class="nt">-f</span> /etc/nginx/ssl/SITE.ru-443-l2.key | <span class="nb">grep</span> <span class="nt">-o</span> privkey[[:alnum:]].pem<span class="si">)</span>
<span class="nv">RESULT_FOR_PUSH_IN_PRIVKEY</span><span class="o">=(</span>/etc/letsencrypt/archive/www.SITE.ru/<span class="nv">$RESULT_READ_SYMLINC_PRIVKEY</span><span class="o">)</span>

<span class="c"># Start:</span>
<span class="nb">sleep </span>15m
rsync <span class="nt">-avx</span> <span class="nt">--numeric-ids</span> <span class="nt">--delete</span> <span class="nt">--progress</span> /etc/letsencrypt/archive/www.SITE.ru-0001/ <span class="nt">-e</span> ssh root@2.2.2.2:/etc/letsencrypt/archive/www.SITE.ru/
ssh root@2.2.2.2 <span class="nb">rm</span> <span class="nt">-f</span> /etc/letsencrypt/live/www.SITE.ru/fullchain.pem
ssh root@2.2.2.2 <span class="nb">ln</span> <span class="nt">-s</span> <span class="nv">$RESULT_FOR_PUSH_IN_FULLCHAIN</span> /etc/letsencrypt/live/www.SITE.ru/fullchain.pem
ssh root@2.2.2.2 <span class="nb">rm</span> <span class="nt">-f</span> /etc/letsencrypt/live/www.SITE.ru/privkey.pem
ssh root@2.2.2.2 <span class="nb">ln</span> <span class="nt">-s</span> <span class="nv">$RESULT_FOR_PUSH_IN_PRIVKEY</span> /etc/letsencrypt/live/www.SITE.ru/privkey.pem
ssh root@2.2.2.2 nginx <span class="nt">-t</span> | <span class="nb">grep</span> <span class="s2">"test is successful"</span>

<span class="c"># Monitoring</span>
<span class="k">if </span><span class="nb">let</span> <span class="s2">"</span><span class="nv">$?</span><span class="s2">==0"</span>
<span class="k">then
</span>ssh root@2.2.2.2 nginx <span class="nt">-s</span> reload
<span class="k">**</span><span class="c">### MONITORING ALERT:**</span>
/usr/local/bin/alert_telegraf_for_certbot.py 0
<span class="k">else</span>
/usr/local/bin/alert_telegraf_for_certbot.py 1
<span class="k">fi</span>
</code></pre></div></div>

<h2 id="-common-mistakes-and-their-solutions">🔴 Common mistakes and their solutions</h2>

<p>🔴 403 Forbidden
Make sure that the web server allows access to .well-known/acme-challenge/</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">ls</span> <span class="nt">-l</span> /var/www/html/.well-known/acme-challenge/
Если используешь Nginx, добавь в конфиг:
location /.well-known/acme-challenge/ <span class="o">{</span>
    root /var/www/html<span class="p">;</span>
<span class="o">}</span>
</code></pre></div></div>

<p>🔴 404 Not Found
Make sure that Certbot creates the file in the correct folder.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>certbot certonly <span class="nt">--webroot</span> <span class="nt">-w</span> /var/www/html <span class="nt">-d</span> example.com
</code></pre></div></div>

<p>Make sure that you have DocumentRoot (Apache) or root (Nginx) configured correctly.</p>

<p>🔴 Port 80 is unavailable
It is important to remember that the Let’s Encrypt authentication server cannot reach the server on port 443, because the certificate has not yet been issued, and it arrives on port 80.</p>]]></content><author><name>Evgenii Zhuravlev</name></author><category term="Linux" /><category term="Linux" /><category term="Certbot" /><category term="SSL" /><summary type="html"><![CDATA[]]></summary></entry></feed>