(control) Add warnings about domain data contamination

2024-01-25 18:26:15 +01:00 · 2024-01-25 18:26:15 +01:00 · 182c0cf28e
commit 182c0cf28e
parent 0b105b5986
3 changed files with 13 additions and 1 deletions
--- a/code/services-core/control-service/src/main/resources/templates/control/node/actions/partial-download-sample-data.hdb
+++ b/code/services-core/control-service/src/main/resources/templates/control/node/actions/partial-download-sample-data.hdb
@ -1,8 +1,15 @@
 <h1 class="my-3">Download Sample Data</h1>

 <div class="my-3 p-3 border bg-light">
-This will download sample crawl data from <a href="https://downloads.marginalia.nu">downloads.marginalia.nu</a> onto Node {{node.id}}.
+<p>This will download sample crawl data from <a href="https://downloads.marginalia.nu">downloads.marginalia.nu</a> onto Node {{node.id}}.
 This is a sample of real crawl data.  It is intended for demo, testing and development purposes.  Several sets are available.
+</p>
+
+<p>
+    <span class="text-danger">Warning</span> While processing the sample data, the domains associated with it will be loaded
+    into the domain database.  This means that if you run the re-crawl action on this machine, regardless of which crawl data
+    is specified, the domains in the sample data will be crawled!
+</p>
 </div>

 <form method="post" action="actions/download-sample-data">
--- a/code/services-core/control-service/src/main/resources/templates/control/node/actions/partial-new-crawl-specs.hdb
+++ b/code/services-core/control-service/src/main/resources/templates/control/node/actions/partial-new-crawl-specs.hdb
@ -6,6 +6,9 @@
        If you are just looking to test the software, feel free to use <a href="https://downloads.marginalia.nu/domain-list-test.txt">this
        short list of marginalia-related websites</a>, that are safe to crawl repeatedly without causing any problems.
    </p>
+
+    <p><span class="text-danger">Warning</span> Ensure <a href="?view=download-sample-data">downloaded sample data</a> has not been loaded onto this instance
+        before performing this action, otherwise those domains will also be crawled while re-crawling in the future!</p>
 </div>

 <form method="post" action="actions/new-crawl-specs">
--- a/code/services-core/control-service/src/main/resources/templates/control/node/actions/partial-recrawl.hdb
+++ b/code/services-core/control-service/src/main/resources/templates/control/node/actions/partial-recrawl.hdb
@ -18,6 +18,8 @@
    crawl spec.  If the document has changed, it will be re-crawled.  If it has not changed, it will be skipped,
    and the previous data will be retained.  This is both faster and easier on the target server.
    </p>
+    <p><span class="text-danger">Warning</span> Ensure <a href="?view=download-sample-data">downloaded sample data</a>
+        has not been loaded onto this instance before performing this action, otherwise those domains will also be crawled!</p>
 </div>

 <form method="post" action="actions/recrawl">