{"id":1165,"date":"2023-06-27T13:50:55","date_gmt":"2023-06-27T13:50:55","guid":{"rendered":"https:\/\/www.the-analytics.club\/?p=1165"},"modified":"2023-06-27T15:58:03","modified_gmt":"2023-06-27T15:58:03","slug":"best-practices-of-downloading-files-from-the-web-using-python","status":"publish","type":"post","link":"https:\/\/www.the-analytics.club\/best-practices-of-downloading-files-from-the-web-using-python\/","title":{"rendered":"Best Practices of Downloading Files From the Web Using Python."},"content":{"rendered":"\n\n\n<p>Downloading files from the internet programmatically is a task frequently encountered in Python applications. <\/p>\n\n\n\n<p>I do it a few times a year. Sometimes, the number of files we need to download from an internet archive is large enough to spend a few weeks. With programmatic access, we can bring it down to less than a day. <\/p>\n\n\n\n<p>I <a href=\"https:\/\/www.the-analytics.club\/download-youtube-videos-in-python\/\" title=\"How to Download YouTube Videos With Python?\">download files using Python<\/a> a few times a year. Also, I&#8217;ve built <a href=\"https:\/\/www.the-analytics.club\/how-to-speed-up-python-data-pipelines-up-to-91x\/\" title=\"How to Speed up Python Data Pipelines up to 91X?\">data pipelines with Python<\/a>, automatically downloading files from the web. <\/p>\n\n\n\n<p>This article delves into the realm of optimal file-downloading techniques using Python, shedding light on crucial aspects like exception handling, employing suitable libraries, and incorporating advanced functionalities such as resumable downloads and stream processing. <\/p>\n\n\n\n<p>Join me as we embark on this journey of exploring the top best practices for Python-based file downloads.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-layout-1 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<div class=\"wp-block-group is-nowrap is-layout-flex wp-container-core-group-layout-1 wp-block-group-is-layout-flex\">\n<p>Grab your aromatic <a href=\"https:\/\/amzn.to\/3CMvTF1\" target=\"_blank\" rel=\"sponsored noreferrer noopener nofollow\">coffee <\/a>(<a href=\"https:\/\/amzn.to\/42YqYeT\" target=\"_blank\" rel=\"sponsored noreferrer noopener nofollow\">or tea<\/a>) and get ready&#8230;!<\/p>\n<\/div>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-51fdb326\"><h2 class=\"uagb-heading-text\">Downloading files using the requests library<\/h2><\/div>\n\n\n\n<p>Python ecosystem is so rich. We have a package for almost every need. <\/p>\n\n\n\n<p>Requests is one such Python library that helps us make HTTP requests programmatically. And this could download files as well. Sometimes, this isn&#8217;t the best way to download files. If you can&#8217;t directly access the file with the URL, you might have to use Selenium to download the file. <\/p>\n\n\n\n<p>But for this post&#8217;s purpose, we stick with the request library. <\/p>\n\n\n\n<p>Let&#8217;s take a look at a simple example that demonstrates how to download a file using the <code>requests<\/code> library:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:16px;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:20px\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"import requests\n\nurl = &quot;https:\/\/example.com\/myfile.txt&quot;\nresponse = requests.get(url)\n\nif response.status_code == 200:\n    with open(&quot;myfile.txt&quot;, &quot;wb&quot;) as file:\n        file.write(response.content)\n        print(&quot;File downloaded successfully!&quot;)\nelse:\n    print(f&quot;Failed to download file. Status code: {response.status_code}&quot;)\n\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">url <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">https:\/\/example.com\/myfile.txt<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">response <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">get<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">url<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">status_code <\/span><span style=\"color: #81A1C1\">==<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #B48EAD\">200<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">with<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">open<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">myfile.txt<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">wb<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> file<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        file<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">write<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">content<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">File downloaded successfully!<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">else<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #81A1C1\">f<\/span><span style=\"color: #A3BE8C\">&quot;Failed to download file. Status code: <\/span><span style=\"color: #EBCB8B\">{<\/span><span style=\"color: #D8DEE9FF\">response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">status_code<\/span><span style=\"color: #EBCB8B\">}<\/span><span style=\"color: #A3BE8C\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#2e3440ff;color:#c8d0e0;font-size:12px;line-height:1;position:relative\">Python<\/span><\/div>\n\n\n\n<p>This code uses the requests library to send a GET request to a URL. If the response status code is 200 (indicating a successful request), it opens a file named. <code>\"myfile.txt\"<\/code> in write binary mode. Then, it writes the content of the response to the file. If the status code is not 200, it prints a failure message along with the actual status code.<\/p>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-dddae7b5\"><h2 class=\"uagb-heading-text\">Handle exceptions proactively when programmatically downloading files.<\/h2><\/div>\n\n\n\n<p>No matter what, the internet is an ocean of unknowns. <\/p>\n\n\n\n<p>Errors can occur at various stages when working with network requests. The server may be temporarily down, and the new <a href=\"https:\/\/www.the-analytics.club\/python-monitor-file-changes\/\" title=\"How to Create File System Triggers in\u00a0Python\">change in the source system<\/a> is not compatible with your script&#8217;s expectation, the connection may be unstable, and many more. <\/p>\n\n\n\n<p>Thus we must proactively address possible issues. Thus enclose your code inside a try-except block and handle it appropriately. <\/p>\n\n\n\n<p>Let me provide you with an example that demonstrates how to handle exceptions gracefully when downloading a file.<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:16px;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:20px\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"import requests\n\nurl = &quot;https:\/\/example.com\/myfile.txt&quot;\n\ntry:\n    response = requests.get(url)\n    response.raise_for_status()\n    with open(&quot;myfile.txt&quot;, &quot;wb&quot;) as file:\n        file.write(response.content)\n        print(&quot;File downloaded successfully!&quot;)\nexcept requests.exceptions.RequestException as e:\n    print(f&quot;Failed to download file: {e}&quot;)\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">url <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">https:\/\/example.com\/myfile.txt<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">try<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    response <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">get<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">url<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">raise_for_status<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">with<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">open<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">myfile.txt<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">wb<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> file<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        file<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">write<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">content<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">File downloaded successfully!<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">except<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">exceptions<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">RequestException <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> e<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #81A1C1\">f<\/span><span style=\"color: #A3BE8C\">&quot;Failed to download file: <\/span><span style=\"color: #EBCB8B\">{<\/span><span style=\"color: #D8DEE9FF\">e<\/span><span style=\"color: #EBCB8B\">}<\/span><span style=\"color: #A3BE8C\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#2e3440ff;color:#c8d0e0;font-size:12px;line-height:1;position:relative\">Python<\/span><\/div>\n\n\n\n<p>The above code encloses the main code inside a try-except block. It handles all the e<a href=\"https:\/\/requests.readthedocs.io\/en\/latest\/_modules\/requests\/exceptions\/\" target=\"_blank\" rel=\"noopener\">xceptions type included in the requests module<\/a>. <\/p>\n\n\n\n<p>Instead, you can handle different types of errors differently. For example, handle all connection issues by retrying after 5 minutes. And it could send an email for an invalid URL error. <\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:16px;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:20px\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"import requests\nimport time\nimport smtplib\n\nurl = &quot;https:\/\/example.com\/myfile.txt&quot;\n\nretry_count = 0\nmax_retries = 3\n\nwhile retry_count &lt; max_retries:\n    try:\n        response = requests.get(url)\n        response.raise_for_status()\n        with open(&quot;myfile.txt&quot;, &quot;wb&quot;) as file:\n            file.write(response.content)\n            print(&quot;File downloaded successfully!&quot;)\n        break\n    except requests.exceptions.RequestException as e:\n        retry_count += 1\n        if retry_count == max_retries:\n            send_email(&quot;Invalid URL Error&quot;, f&quot;Failed to download file: {e}&quot;)\n            print(f&quot;Failed to download file: {e}&quot;)\n        else:\n            print(f&quot;Connection error, retrying in 5 minutes...&quot;)\n            time.sleep(300)\n\ndef send_email(subject, message):\n    # Code to send an email using an email sending service or library\n    pass\n\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> time<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> smtplib<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">url <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">https:\/\/example.com\/myfile.txt<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">retry_count <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #B48EAD\">0<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">max_retries <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #B48EAD\">3<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">while<\/span><span style=\"color: #D8DEE9FF\"> retry_count <\/span><span style=\"color: #81A1C1\">&lt;<\/span><span style=\"color: #D8DEE9FF\"> max_retries<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">try<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        response <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">get<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">url<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">raise_for_status<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">with<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">open<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">myfile.txt<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">wb<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> file<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            file<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">write<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">content<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">File downloaded successfully!<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">break<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">except<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">exceptions<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">RequestException <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> e<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        retry_count <\/span><span style=\"color: #81A1C1\">+=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #B48EAD\">1<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> retry_count <\/span><span style=\"color: #81A1C1\">==<\/span><span style=\"color: #D8DEE9FF\"> max_retries<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #88C0D0\">send_email<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">Invalid URL Error<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">f<\/span><span style=\"color: #A3BE8C\">&quot;Failed to download file: <\/span><span style=\"color: #EBCB8B\">{<\/span><span style=\"color: #D8DEE9FF\">e<\/span><span style=\"color: #EBCB8B\">}<\/span><span style=\"color: #A3BE8C\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #81A1C1\">f<\/span><span style=\"color: #A3BE8C\">&quot;Failed to download file: <\/span><span style=\"color: #EBCB8B\">{<\/span><span style=\"color: #D8DEE9FF\">e<\/span><span style=\"color: #EBCB8B\">}<\/span><span style=\"color: #A3BE8C\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">else<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #81A1C1\">f<\/span><span style=\"color: #A3BE8C\">&quot;Connection error, retrying in 5 minutes...&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            time<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">sleep<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #B48EAD\">300<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">send_email<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">subject<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">message<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #616E88\"># Code to send an email using an email sending service or library<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">pass<\/span><\/span>\n<span class=\"line\"><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#2e3440ff;color:#c8d0e0;font-size:12px;line-height:1;position:relative\">Python<\/span><\/div>\n\n\n\n<p>You&#8217;d have realized how useful handling exceptions is. It makes life easy when fixing issues in <a href=\"https:\/\/www.the-analytics.club\/data-challenges-in-production-ml-systems\/\" title=\"Data challenges in Production ML Systems.\">production systems<\/a>. <\/p>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-302c78cc\"><h2 class=\"uagb-heading-text\">Implement download resumption for large files.<\/h2><\/div>\n\n\n\n<p>Have you noticed when you&#8217;re downloading other network activities get jammed? It&#8217;s more common if you try to <a href=\"https:\/\/www.the-analytics.club\/python-web-scraping-tutorial\/\" title=\"3 Techniques to Scrape Any Websites Using Python\">download files in bulk using Python<\/a> (or any other programming language)<\/p>\n\n\n\n<p>For large files or unstable network connections, implementing <a href=\"https:\/\/developer.mozilla.org\/en-US\/docs\/Web\/HTTP\/Headers\/Range\" target=\"_blank\" rel=\"noopener\">download resumption<\/a> is a valuable feature. It allows you to resume a download from where it left off rather than starting from scratch. <\/p>\n\n\n\n<p>To implement download resumption, you can utilize the <code>Range<\/code> header in your requests. Here&#8217;s an example:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:16px;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:20px\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"import requests\n\nurl = &quot;https:\/\/example.com\/largefile.zip&quot;\nheaders = {&quot;Range&quot;: &quot;bytes=500-&quot;}\n\nresponse = requests.get(url, headers=headers)\n\n# Append the downloaded content to the existing file\nwith open(&quot;largefile.zip&quot;, &quot;ab&quot;) as file:\n    file.write(response.content)\n\n# Continue appending subsequent partial downloads until complete\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">url <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">https:\/\/example.com\/largefile.zip<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">headers <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">Range<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">bytes=500-<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">}<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">response <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">get<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">url<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">headers<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">headers<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Append the downloaded content to the existing file<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">with<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">open<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">largefile.zip<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #A3BE8C\">ab<\/span><span style=\"color: #ECEFF4\">&quot;<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> file<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    file<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">write<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #D8DEE9FF\">content<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #616E88\"># Continue appending subsequent partial downloads until complete<\/span><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#2e3440ff;color:#c8d0e0;font-size:12px;line-height:1;position:relative\">Python<\/span><\/div>\n\n\n\n<p>Recemption is definitely useful when downloading large files. You don&#8217;t have to throttle your network bandwith is my primary goal of doing it. <\/p>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-f8ef21f3\"><h2 class=\"uagb-heading-text\"><strong>Set user-agent headers<\/strong>.<\/h2><\/div>\n\n\n\n<p>Sending user-agent headers is not a must in most cases. <\/p>\n\n\n\n<p>But, some websites may require a user-agent header to be set in the request to simulate a web browser. You can set the user-agent header in the request headers to make your request appear more like a regular browser request. <\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:16px;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:20px\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"headers = {'User-Agent': 'Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/58.0.3029.110 Safari\/537.3'}\nresponse = requests.get(url, headers=headers)\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #D8DEE9FF\">headers <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">{<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">User-Agent<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">Mozilla\/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/58.0.3029.110 Safari\/537.3<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">}<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">response <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">get<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">url<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">headers<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\">headers<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#2e3440ff;color:#c8d0e0;font-size:12px;line-height:1;position:relative\">Python<\/span><\/div>\n\n\n\n<p>As you see it&#8217;s pretty straightforward to send User-Agent headers. Here&#8217;s a <a href=\"https:\/\/deviceatlas.com\/blog\/list-of-user-agent-strings\" target=\"_blank\" rel=\"noopener\">list of popular user agents <\/a>for your reference. <\/p>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-dea55054\"><h2 class=\"uagb-heading-text\">Iterate over the response.<\/h2><\/div>\n\n\n\n<p>Large files may have come through the network, but you&#8217;d have to process it without harming your memory. <\/p>\n\n\n\n<p>When downloading large files, it&#8217;s advisable to stream the response instead of loading the entire file into memory. Streaming allows you to download and save the file in smaller chunks, reducing memory consumption. Here&#8217;s an example:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:16px;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:20px\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"response = requests.get(url, stream=True)\nwith open('file.txt', 'wb') as file:\n    for chunk in response.iter_content(chunk_size=8192):\n        if chunk:\n            file.write(chunk)\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #D8DEE9FF\">response <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> requests<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">get<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">url<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #D8DEE9\">stream<\/span><span style=\"color: #81A1C1\">=True<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">with<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">open<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">file.txt<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">wb<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> file<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">for<\/span><span style=\"color: #D8DEE9FF\"> chunk <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> response<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">iter_content<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">chunk_size<\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #B48EAD\">8192<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> chunk<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            file<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">write<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">chunk<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#2e3440ff;color:#c8d0e0;font-size:12px;line-height:1;position:relative\">Python<\/span><\/div>\n\n\n\n<p>If you&#8217;re chunking the <a href=\"https:\/\/www.the-analytics.club\/download-files-using-python\/\" title=\"How to Download Files From a URL Using Python?\">downloaded file in Python<\/a> like as shown in the example, you can process the file with limited resources. This is a must if you&#8217;re developing data pipelines. In such a system, you&#8217;d never know when there&#8217;s a overusage in the system. <\/p>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-1d108438\"><h2 class=\"uagb-heading-text\">Verify the integrity of your downloaded file.<\/h2><\/div>\n\n\n\n<p>After downloading a file, it&#8217;s a good practice to verify its integrity using a hash algorithm. This ensures that the file hasn&#8217;t been corrupted during the download process. Here&#8217;s an example using the <code>hashlib<\/code> module to calculate the MD5 hash of a file:<\/p>\n\n\n\n<div class=\"wp-block-kevinbatdorf-code-block-pro padding-bottom-disabled\" data-code-block-pro-font-family=\"Code-Pro-JetBrains-Mono\" style=\"font-size:16px;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:20px\"><span style=\"display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"54\" height=\"14\" viewBox=\"0 0 54 14\"><g fill=\"none\" fill-rule=\"evenodd\" transform=\"translate(1 1)\"><circle cx=\"6\" cy=\"6\" r=\"6\" fill=\"#FF5F56\" stroke=\"#E0443E\" stroke-width=\".5\"><\/circle><circle cx=\"26\" cy=\"6\" r=\"6\" fill=\"#FFBD2E\" stroke=\"#DEA123\" stroke-width=\".5\"><\/circle><circle cx=\"46\" cy=\"6\" r=\"6\" fill=\"#27C93F\" stroke=\"#1AAB29\" stroke-width=\".5\"><\/circle><\/g><\/svg><\/span><span role=\"button\" tabindex=\"0\" data-code=\"import hashlib\n\ndef calculate_md5(file_path):\n    hash_md5 = hashlib.md5()\n    with open(file_path, 'rb') as file:\n        for chunk in iter(lambda: file.read(4096), b''):\n            hash_md5.update(chunk)\n    return hash_md5.hexdigest()\n\ndownloaded_file = 'file.txt'\nexpected_md5 = '...'\nif calculate_md5(downloaded_file) == expected_md5:\n    print('File integrity verified!')\nelse:\n    print('File integrity check failed!')\" style=\"color:#d8dee9ff;display:none\" aria-label=\"Copy\" class=\"code-block-pro-copy-button\"><svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" style=\"width:24px;height:24px\" fill=\"none\" viewBox=\"0 0 24 24\" stroke=\"currentColor\" stroke-width=\"2\"><path class=\"with-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4\"><\/path><path class=\"without-check\" stroke-linecap=\"round\" stroke-linejoin=\"round\" d=\"M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2\"><\/path><\/svg><\/span><pre class=\"shiki nord\" style=\"background-color: #2e3440ff\" tabindex=\"0\"><code><span class=\"line\"><span style=\"color: #81A1C1\">import<\/span><span style=\"color: #D8DEE9FF\"> hashlib<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">def<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">calculate_md5<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9\">file_path<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    hash_md5 <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> hashlib<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">md5<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">with<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">open<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">file_path<\/span><span style=\"color: #ECEFF4\">,<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">rb<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">as<\/span><span style=\"color: #D8DEE9FF\"> file<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">        <\/span><span style=\"color: #81A1C1\">for<\/span><span style=\"color: #D8DEE9FF\"> chunk <\/span><span style=\"color: #81A1C1\">in<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">iter<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #81A1C1\">lambda<\/span><span style=\"color: #ECEFF4\">:<\/span><span style=\"color: #D8DEE9FF\"> file<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">read<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #B48EAD\">4096<\/span><span style=\"color: #ECEFF4\">),<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">b<\/span><span style=\"color: #ECEFF4\">&#39;&#39;<\/span><span style=\"color: #ECEFF4\">):<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">            hash_md5<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">update<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">chunk<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #81A1C1\">return<\/span><span style=\"color: #D8DEE9FF\"> hash_md5<\/span><span style=\"color: #ECEFF4\">.<\/span><span style=\"color: #88C0D0\">hexdigest<\/span><span style=\"color: #ECEFF4\">()<\/span><\/span>\n<span class=\"line\"><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">downloaded_file <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">file.txt<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">expected_md5 <\/span><span style=\"color: #81A1C1\">=<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">...<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">if<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #88C0D0\">calculate_md5<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #D8DEE9FF\">downloaded_file<\/span><span style=\"color: #ECEFF4\">)<\/span><span style=\"color: #D8DEE9FF\"> <\/span><span style=\"color: #81A1C1\">==<\/span><span style=\"color: #D8DEE9FF\"> expected_md5<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">File integrity verified!<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span>\n<span class=\"line\"><span style=\"color: #81A1C1\">else<\/span><span style=\"color: #ECEFF4\">:<\/span><\/span>\n<span class=\"line\"><span style=\"color: #D8DEE9FF\">    <\/span><span style=\"color: #88C0D0\">print<\/span><span style=\"color: #ECEFF4\">(<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #A3BE8C\">File integrity check failed!<\/span><span style=\"color: #ECEFF4\">&#39;<\/span><span style=\"color: #ECEFF4\">)<\/span><\/span><\/code><\/pre><span style=\"display:flex;align-items:flex-end;padding:10px;width:100%;justify-content:flex-end;background-color:#2e3440ff;color:#c8d0e0;font-size:12px;line-height:1;position:relative\">Python<\/span><\/div>\n\n\n\n<p>There we go, you&#8217;re file&#8217;s integrity is verified. <\/p>\n\n\n\n<div class=\"wp-block-uagb-advanced-heading uagb-block-926c3fe9\"><h2 class=\"uagb-heading-text\">Conclusion<\/h2><\/div>\n\n\n\n<p>We don&#8217;t normally download files programmatically. But there are situations where it&#8217;s needed. <\/p>\n\n\n\n<p>We usually don&#8217;t care about anything other than the file being on our hard disks. But truly, there are many other things we need to care for. <\/p>\n\n\n\n<p>This post has discussed some of my suggestions when downloading files from the internet using Python. I&#8217;ve been doing this a few times in a year. <\/p>\n\n\n\n<p>Hope it helps.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dots\"\/>\n\n\n\n<blockquote class=\"wp-block-quote\">\n<p>Thanks for the read, friend. It seems you and I have lots of common interests. Say Hi to me on <a href=\"https:\/\/www.linkedin.com\/in\/thuwarakesh\/\" target=\"_blank\" rel=\"nofollow noopener\"><strong>LinkedIn<\/strong><\/a>, <a href=\"https:\/\/twitter.com\/Thuwarakesh\" target=\"_blank\" rel=\"nofollow noopener\"><strong>Twitter<\/strong><\/a>, and <a href=\"https:\/\/thuwarakesh.medium.com\/subscribe\" target=\"_blank\" rel=\"nofollow noopener\"><strong>Medium<\/strong><\/a>. <\/p>\n\n\n\n<p>Not a Medium member yet? Please use this link to <a href=\"https:\/\/thuwarakesh.medium.com\/membership\" target=\"_blank\" rel=\"nofollow noopener\"><strong>become a member<\/strong><\/a> because I earn a commission for referring at no extra cost for you.<\/p>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Downloading files from the internet programmatically is a task frequently encountered in Python applications. I do it a&#8230;<\/p>\n","protected":false},"author":2,"featured_media":1323,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_blocks_custom_css":"","_kad_blocks_head_custom_js":"","_kad_blocks_body_custom_js":"","_kad_blocks_footer_custom_js":"","_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"footnotes":""},"categories":[3,5],"tags":[24,33,27],"taxonomy_info":{"category":[{"value":3,"label":"Python"},{"value":5,"label":"Programming"}],"post_tag":[{"value":24,"label":"automation"},{"value":33,"label":"best practice"},{"value":27,"label":"python"}]},"featured_image_src_large":["https:\/\/www.the-analytics.club\/wp-content\/uploads\/2023\/06\/Digital-Intelligence-Images-1-1024x576.jpg",1024,576,true],"author_info":{"display_name":"Thuwarakesh","author_link":"https:\/\/www.the-analytics.club\/author\/thuwarakesh\/"},"comment_info":1,"category_info":[{"term_id":3,"name":"Python","slug":"python","term_group":0,"term_taxonomy_id":3,"taxonomy":"category","description":"","parent":5,"count":52,"filter":"raw","cat_ID":3,"category_count":52,"category_description":"","cat_name":"Python","category_nicename":"python","category_parent":5},{"term_id":5,"name":"Programming","slug":"programming","term_group":0,"term_taxonomy_id":5,"taxonomy":"category","description":"","parent":0,"count":43,"filter":"raw","cat_ID":5,"category_count":43,"category_description":"","cat_name":"Programming","category_nicename":"programming","category_parent":0}],"tag_info":[{"term_id":24,"name":"automation","slug":"automation","term_group":0,"term_taxonomy_id":24,"taxonomy":"post_tag","description":"","parent":0,"count":5,"filter":"raw"},{"term_id":33,"name":"best practice","slug":"best-practice","term_group":0,"term_taxonomy_id":33,"taxonomy":"post_tag","description":"","parent":0,"count":2,"filter":"raw"},{"term_id":27,"name":"python","slug":"python","term_group":0,"term_taxonomy_id":27,"taxonomy":"post_tag","description":"","parent":0,"count":9,"filter":"raw"}],"_links":{"self":[{"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/posts\/1165"}],"collection":[{"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/comments?post=1165"}],"version-history":[{"count":11,"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/posts\/1165\/revisions"}],"predecessor-version":[{"id":1353,"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/posts\/1165\/revisions\/1353"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/media\/1323"}],"wp:attachment":[{"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/media?parent=1165"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/categories?post=1165"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.the-analytics.club\/wp-json\/wp\/v2\/tags?post=1165"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}