python-day1-API接口提取器1.0

本文最后更新于 2024年10月23日 上午

前言:本系列主要是从0开始训练python脚本编写的能力,编写一些实用的小脚本解决挖洞中遇到的问题,之后的目标是能独立完成一款不错的安全工具

设计思路

控制台接收目标URL,从网站的JS文件中匹配类似于URL路径的字符串,并打印,输出API接口文件

  1. requests模块请求目标网站
  2. re模块匹配js文件链接
  3. re模块匹配URL路径
  4. 打印,输出结果文件

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import requests
import re
import warnings
import argparse
from urllib.parse import urljoin
from urllib3.exceptions import InsecureRequestWarning

# 忽略 InsecureRequestWarning 警告
warnings.simplefilter('ignore', InsecureRequestWarning)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36',
}
def extract_js_files(url):
try:
response = requests.get(url, headers=headers, verify=False) # 忽略 SSL 证书验证
response.raise_for_status()
except requests.RequestException as e:
print(f"Error fetching {url}: {e}")
return []
js_files = re.findall(r'=(.*?\.js)', response.text)
return [urljoin(url, js) for js in js_files]

def extract_api_paths(js_url):
try:
response = requests.get(js_url,headers=headers,verify=False) # 忽略 SSL 证书验证
response.raise_for_status()
except requests.RequestException as e:
print(f"Error fetching {js_url}: {e}")
return []

# 正则表达式匹配可能的 URL 路径
paths = re.findall(r'["\'](\/[\w\/\-\_\?\.]+)["\']', response.text)
return list(set(paths)) # 去重并转换为列表

def main(target_url, output_file):
js_urls = extract_js_files(target_url)
if not js_urls:
print("No JavaScript files found.")
return

all_paths = []
for js_url in js_urls:
print(f'Extracting paths from {js_url}')
paths = extract_api_paths(js_url)
all_paths.extend(paths)

if not all_paths:
print("No API paths found.")
return

with open(output_file, 'w') as f:
for path in all_paths:
f.write(path + '\n')

print(f'API paths have been saved to {output_file}')

if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Extract API paths from JS files on a given URL.')
parser.add_argument('url', help='Target URL to scan for JS files.')
parser.add_argument('output', help='Output file to save extracted API paths.')
args = parser.parse_args()
main(args.url, args.output)

运行测试

缺陷

正则匹配写得不够好,调试输出杂乱,错误处理不足够细,没有使用并发,效率不高,不能爬取动态加载的JS……

后面版本会进行更新修改