mirror of
https://github.com/J3rome/py-requirements-guesser.git
synced 2024-11-23 18:29:32 +01:00
* Refactoring
+ Added setup.py * Updated README * Should have been comitted
This commit is contained in:
parent
0f6a41a7de
commit
9a04d4ce0e
6
.gitignore
vendored
Normal file
6
.gitignore
vendored
Normal file
@ -0,0 +1,6 @@
|
||||
repositories
|
||||
.idea/
|
||||
__pycache__
|
||||
*.egg-info
|
||||
/dist/
|
||||
/build/
|
76
README.md
76
README.md
@ -1,2 +1,74 @@
|
||||
# python-requirements-finder
|
||||
Package to infer requirements package version based on git history
|
||||
# Python-Requirements-Guesser
|
||||
|
||||
> ⚠️ This is alpha quality software. Work in progress
|
||||
|
||||
Attempt to guess `requirements.txt` modules versions based on Git history.
|
||||
|
||||
## What is the problem ?
|
||||
Did you ever clone a repo with python code that didn't specify library versions in a `requirements.txt` file ?
|
||||
Or even worst: a repo without a `requirements.txt`...
|
||||
|
||||
Reproducing results is hard, it's even harder when you have mismatched library versions.
|
||||
|
||||
## Solution
|
||||
There is a fair chance that the owner of the repo you just cloned installed most of it's packages using
|
||||
```bash
|
||||
pip install <package name>
|
||||
```
|
||||
This would have installed the latest available version at the time the command was runned.
|
||||
|
||||
Based on this, we look at the `git commit history` to find out when a package was first imported in the code or when it was first added to the `requirements.txt` file.
|
||||
|
||||
We then query `Pypi` to retrieve the version available at the commit date.
|
||||
|
||||
## Usage
|
||||
`Py-Requirements-Guesser` should be runned inside a git repository.
|
||||
```bash
|
||||
py-requirements-guesser --write {requirements.txt path}
|
||||
```
|
||||
You will be prompted by a serie of choice to orient the guessing process.
|
||||
|
||||
See video :
|
||||
Ascii show
|
||||
|
||||
## Installation
|
||||
This package doesn't have any dependencies.
|
||||
To install the `Py-Requirements-Guesser`:
|
||||
```bash
|
||||
git clone https://github.com/J3rome/py-requirements-guesser
|
||||
python3 setup.py install
|
||||
```
|
||||
|
||||
|
||||
## Package name mapping - Pipreqs
|
||||
There might be mismatches between the name of a package on `Pypi` and the name used to `import` it (Ex : `pip install PyYAML` & `import yaml` ).
|
||||
There doesn't seem to be a straightforward way to do the mapping between `Pypi` name and `import` name.
|
||||
|
||||
The great [PipReqs](https://github.com/bndr/pipreqs) package (which was an inspiration for this package) manually maintains a mapping file between `Pypi` names and the `import` names.
|
||||
They also maintain a list of the standard library module names.
|
||||
|
||||
For now, we grab the [mapping](https://github.com/bndr/pipreqs/blob/master/pipreqs/mapping) and [stdlib](https://github.com/bndr/pipreqs/blob/master/pipreqs/stdlib) files at commit `90102acdbb23c09574d27df8bd1f568d34e0cfd3`.
|
||||
|
||||
**Thanks guys** !
|
||||
|
||||
## Additional arguments
|
||||
`Py-Requirements-Guesser` can take 2 additional parameters :
|
||||
|
||||
`--keep_unused_packages`: By default, unused packages are ignored. This parameter will force version guessing for the packages in `requirements.txt` that are not `imported` in the code anywhere.
|
||||
|
||||
`--force_guess {package1},{package2},..`: By default, if your code contains a module named `yaml.py`, `import yaml` statements won't be analyzed. Use this argument if local modules have conflicting names with `Pypi` packages to force version guessing.
|
||||
|
||||
## TODO
|
||||
- Guess/Pin the dependencies tree of the package Ex : Torch package will install numpy, etc
|
||||
- Poetry support ?
|
||||
- Jupyter notebook support
|
||||
- Add guessing choice where user can choose version between the time the package was first imported and the date of the last commit on a python file
|
||||
- Detect python & os versions. Some package versions might not be available for certain os or python versions
|
||||
- Better output/UX
|
||||
|
||||
## License
|
||||
GNU GPLV3 see [License](LICENSE)
|
||||
|
||||
## Contributing
|
||||
Pull requests are welcomed !
|
||||
Fill up an issue if you encounter any problem !
|
||||
|
286
main.py
286
main.py
@ -1,286 +0,0 @@
|
||||
import re
|
||||
import os
|
||||
import argparse
|
||||
import subprocess
|
||||
import json
|
||||
from datetime import datetime
|
||||
from urllib.request import urlopen
|
||||
|
||||
from utils import load_packages_from_requirements, get_mapping_files_from_pipreqs, user_response_multi_choices
|
||||
from utils import get_date_last_modified_python_file, get_local_modules, validate_cwd_is_git_repo, user_response_yes_no
|
||||
|
||||
# TODO : Pin also the dependencies tree of the packages Ex : Torch package might install numpy, etc
|
||||
# TODO : Detect python & os versions. Some package versions might not be available for certain os or python versions
|
||||
# TODO : Add more guesses based on other dates :
|
||||
# - When the project was first created
|
||||
# - Last commit (That wasn't on an .md file)
|
||||
# TODO : Poetry support ?
|
||||
# TODO : Add jupyter notebook support
|
||||
|
||||
EXTRACT_DATE_REGEX = re.compile(r'date\s-\s(\d+)')
|
||||
LETTER_REGEX = re.compile(r'[a-zA-Z]')
|
||||
|
||||
parser = argparse.ArgumentParser("Python Requirements Version Guesser")
|
||||
parser.add_argument('--write', type=str, default=None, required=False, nargs='?', const='')
|
||||
parser.add_argument('--force_guess', type=str, default=None, required=False)
|
||||
parser.add_argument('--keep_unused_packages', action='store_true', required=False)
|
||||
|
||||
|
||||
def get_pypi_history(package_name, ignore_release_candidat=True):
|
||||
"""
|
||||
Retrieve version release dates via Pypi JSON api
|
||||
"""
|
||||
try:
|
||||
resp = urlopen(f"https://pypi.org/pypi/{package_name}/json")
|
||||
except Exception as e:
|
||||
if hasattr(e, 'getcode') and e.getcode() == 404:
|
||||
return None
|
||||
else:
|
||||
print("[ERROR] Internet access is required to fetch package history from Pypi")
|
||||
exit(1)
|
||||
|
||||
resp = json.loads(resp.read())
|
||||
|
||||
versions = []
|
||||
for version, release_info_per_os in resp['releases'].items():
|
||||
# Just taking the first platform upload date for now..
|
||||
# Is it really different for other platforms ? Need to validate
|
||||
# TODO : Give appropriate version based on os and python Versions resp['info']['requires_dist'] # ['require_python']
|
||||
if len(release_info_per_os) == 0:
|
||||
continue
|
||||
|
||||
if ignore_release_candidat and LETTER_REGEX.search(version):
|
||||
continue
|
||||
|
||||
release_info = release_info_per_os[0]
|
||||
release_date = datetime.strptime(release_info['upload_time'].split("T")[0], '%Y-%m-%d')
|
||||
versions.append((version, release_date))
|
||||
|
||||
# FIXME : Do we really need to sort ? Versions should already be sorted
|
||||
return sorted(versions, key=lambda x:x[1], reverse=True)
|
||||
|
||||
|
||||
def find_version_at_date(available_versions, date):
|
||||
last_version = available_versions[0][0]
|
||||
|
||||
# FIXME : Do binary search
|
||||
for candidate_version, candidate_date in available_versions:
|
||||
if date >= candidate_date:
|
||||
return candidate_version
|
||||
else:
|
||||
last_version = candidate_version
|
||||
|
||||
# Date is older than available versions... Fallback on the oldest available version
|
||||
return last_version
|
||||
|
||||
|
||||
def get_all_imports(stdlib_list=None):
|
||||
cmd = f'grep -PRoh --include="*.py" "(?<=^import )\\w*|(?<=^from )\\w*" . | sort | uniq'
|
||||
|
||||
try:
|
||||
grep_out = subprocess.check_output(cmd, shell=True).decode().strip()
|
||||
except:
|
||||
grep_out = ""
|
||||
|
||||
if len(grep_out) == 0:
|
||||
raise Exception(f"[ERROR] couldn't find any import statement")
|
||||
|
||||
imports = [l.strip() for l in grep_out.split("\n")]
|
||||
|
||||
if stdlib_list:
|
||||
return [l for l in imports if l not in stdlib_list]
|
||||
|
||||
return imports
|
||||
|
||||
|
||||
def get_date_when_package_committed(package_name, via_requirements=False, latest_addition=False):
|
||||
if not via_requirements:
|
||||
search_pattern = f"^import {package_name}|^from {package_name}"
|
||||
filename = ""
|
||||
else:
|
||||
search_pattern = f"{package_name}$"
|
||||
filename = "requirements.txt"
|
||||
|
||||
# We grep for 'date' | '+ search pattern' so that we keep only commits that insert lines (+)
|
||||
cmd = f"git log -i -G '{search_pattern}' --pretty='format:date - %at' --date unix -p {filename} | grep -i '^date - \\|\\+.*{package_name}'"
|
||||
|
||||
try:
|
||||
blame_out = subprocess.check_output(cmd, shell=True).decode().strip()
|
||||
except:
|
||||
blame_out = ""
|
||||
|
||||
if len(blame_out) == 0:
|
||||
#return []
|
||||
if not via_requirements:
|
||||
msg = f"'{package_name}' is defined in requirements.txt but not used, ignoring"
|
||||
else:
|
||||
msg = f"'{package_name}' was not found in requirements.txt"
|
||||
|
||||
f"[INFO] {msg}"
|
||||
return None
|
||||
|
||||
# Remove commit that are not directly followed by '+ import' (We grepped for this in cmd)
|
||||
# This is ugly.. TODO: figure out a better way in the grep command
|
||||
dates = []
|
||||
got_plus = False
|
||||
for line in blame_out.split('\n')[::-1]:
|
||||
if line[0] == "+":
|
||||
got_plus = True
|
||||
elif got_plus:
|
||||
got_plus = False
|
||||
|
||||
matches = EXTRACT_DATE_REGEX.search(line)
|
||||
if matches:
|
||||
dates.append(datetime.fromtimestamp(int(matches.group(1))))
|
||||
else:
|
||||
raise Exception("[ERROR] while parsing git-log")
|
||||
|
||||
# Get first date where the line was added
|
||||
return sorted(dates, reverse=not latest_addition)[0]
|
||||
|
||||
|
||||
def guess_package_versions(package_list, from_import_to_package_mapping, from_package_to_import_mapping, packages_in_requirements, keep_unused_packages=False):
|
||||
packages = []
|
||||
for package_name, version in all_packages.items():
|
||||
print("\n" + "-"*40)
|
||||
print(f"PACKAGE : {package_name}")
|
||||
if version is None:
|
||||
# Reset variables
|
||||
choice = None
|
||||
date_added_via_import_str = None
|
||||
date_added_via_req_str = None
|
||||
import_version = None
|
||||
req_version = None
|
||||
|
||||
# Pypi package to import mapping
|
||||
import_name = from_package_to_import_mapping.get(package_name, package_name)
|
||||
pypi_package_name = from_import_to_package_mapping.get(package_name, package_name)
|
||||
|
||||
# Get available versions from Pypi
|
||||
available_versions = get_pypi_history(pypi_package_name, ignore_release_candidat=True)
|
||||
|
||||
if available_versions is None:
|
||||
print(f"[INFO] Couldn't find Pypi releases for package '{package_name}', ignoring")
|
||||
continue
|
||||
|
||||
# Retrieve candidate version based on the first time the package was imported in *.py
|
||||
date_added_via_import = get_date_when_package_committed(import_name, via_requirements=False)
|
||||
if date_added_via_import is None:
|
||||
print(f" [INFO] Package '{package_name}' is defined in requirements.txt but not used (Or committed), ")
|
||||
if keep_unused_packages:
|
||||
print(" will use the requirements version since --keep_unused_packages set")
|
||||
choice = 2
|
||||
else:
|
||||
print(f"[INFO] Ignoring package '{package_name}' (Use --keep_unused_packages if you want to keep it)")
|
||||
continue
|
||||
else:
|
||||
date_added_via_import_str = date_added_via_import.strftime("%Y-%m-%d")
|
||||
import_version = find_version_at_date(available_versions, date_added_via_import)
|
||||
|
||||
# Retrieve candidate version based on the first time the package was added to requirements.txt
|
||||
if pypi_package_name.lower() in packages_in_requirements:
|
||||
date_added_via_req = get_date_when_package_committed(pypi_package_name, via_requirements=True)
|
||||
if date_added_via_req is not None:
|
||||
req_version = find_version_at_date(available_versions, date_added_via_req)
|
||||
date_added_via_req_str = date_added_via_req.strftime("%Y-%m-%d")
|
||||
else:
|
||||
print(f" [INFO] Package '{package_name}' was not in requirements.txt, using date of first import (Version {import_version} / {date_added_via_import_str})")
|
||||
choice = 1
|
||||
|
||||
if choice is None:
|
||||
if req_version != import_version:
|
||||
# Ask user to choose version based on either first import date or first added to requirements.txt date
|
||||
choice = user_response_multi_choices(f"Choose guessing strategy for package '{package_name}'", [
|
||||
f'{"First time the package was imported".ljust(50)} (Version {import_version} / {date_added_via_import_str})',
|
||||
f'{"When the package was added to requirements.txt".ljust(50)} (Version {req_version} / {date_added_via_req_str})'
|
||||
])
|
||||
else:
|
||||
# Both requirements.txt and first import resolve to the same version
|
||||
choice = 1
|
||||
else:
|
||||
print(f" [INFO] Package '{package_name}' was not found in requirements.txt, using date of first import (Version {import_version} / {date_added_via_import_str})")
|
||||
choice = 1
|
||||
|
||||
if choice == 2:
|
||||
version = req_version
|
||||
else:
|
||||
version = import_version
|
||||
|
||||
if version is not None:
|
||||
print(f"[INFO] Package '{package_name}' was attributed version {version}")
|
||||
else:
|
||||
print(f"[ERROR] Couldn't attribute version to package '{package_name}'. Are you sure you commited the changes ?")
|
||||
continue
|
||||
|
||||
else:
|
||||
print(f"[INFO] Package '{package_name}' version is specified in requirements.txt (Version {version})")
|
||||
|
||||
packages.append((package_name, version))
|
||||
|
||||
return packages
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
print("="*60)
|
||||
print("Python requirements guesser")
|
||||
print("="*60)
|
||||
print(f"Guessing package versions for project '{os.getcwd()}'")
|
||||
|
||||
if not validate_cwd_is_git_repo():
|
||||
print("[ERROR] py-reqs-guesser must be runned inside a git repository")
|
||||
exit(1)
|
||||
|
||||
print("Follow the steps to guess package versions based on when they were added to git.")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
# Retrive mapping files from https://github.com/bndr/pipreqs
|
||||
stdlib_list, from_import_to_package_mapping, from_package_to_import_mapping = get_mapping_files_from_pipreqs()
|
||||
|
||||
# Get local packages
|
||||
if args.force_guess:
|
||||
args.force_guess = set(args.force_guess.strip().split(","))
|
||||
|
||||
local_packages = get_local_modules(print_modules=True, force_guess=args.force_guess)
|
||||
|
||||
# Remove local_packages from the list of imports
|
||||
stdlib_list.update(local_packages)
|
||||
|
||||
# Retrieve all imported packages in project
|
||||
all_imported_packages = set(get_all_imports(stdlib_list))
|
||||
|
||||
# Retrieve packages in requirements.txt
|
||||
packages_in_requirements_version_map = load_packages_from_requirements('requirements.txt')
|
||||
packages_in_requirements = set(packages_in_requirements_version_map.keys())
|
||||
|
||||
|
||||
# Merge packages in requirements.txt and imports
|
||||
all_packages = packages_in_requirements_version_map
|
||||
extra_packages = all_imported_packages - packages_in_requirements
|
||||
for extra_package in extra_packages:
|
||||
all_packages[extra_package] = None
|
||||
|
||||
# Interactive guessing of packages versions
|
||||
packages = guess_package_versions(all_packages, from_import_to_package_mapping, from_package_to_import_mapping, packages_in_requirements, keep_unused_packages=args.keep_unused_packages)
|
||||
|
||||
new_requirements_txt = ""
|
||||
for package_name, version in sorted(packages, key=lambda x:x[0]):
|
||||
new_requirements_txt += f"{package_name}=={version}\n"
|
||||
|
||||
print("\n" + "="*60 + "\n")
|
||||
print("Requirements.txt :")
|
||||
print(new_requirements_txt)
|
||||
if args.write is None:
|
||||
print("Use the --write {path} parameter to write the new requirements file")
|
||||
else:
|
||||
if len(args.write) == 0:
|
||||
args.write = "requirements.txt"
|
||||
|
||||
print(f"Writing requirements to file {args.write}")
|
||||
|
||||
if os.path.exists(args.write) and \
|
||||
not user_response_yes_no(f"File {args.write} already exist, are you sure you want to overwrite it ?"):
|
||||
exit(0)
|
||||
|
||||
with open(args.write, 'w') as f:
|
||||
f.write(new_requirements_txt)
|
0
py_requirements_guesser/__init__.py
Normal file
0
py_requirements_guesser/__init__.py
Normal file
54
py_requirements_guesser/cli.py
Normal file
54
py_requirements_guesser/cli.py
Normal file
@ -0,0 +1,54 @@
|
||||
import os
|
||||
import argparse
|
||||
|
||||
from .guesser import Guesser
|
||||
from .utils import validate_cwd_is_git_repo, user_response_yes_no, get_requirements_txt_lines, write_requirements_file
|
||||
|
||||
|
||||
__VERSION__ = "0.0.1"
|
||||
|
||||
parser = argparse.ArgumentParser("Python Requirements Version Guesser")
|
||||
parser.add_argument('--write', type=str, default=None, required=False, nargs='?', const='')
|
||||
parser.add_argument('--force_guess', type=str, default=None, required=False)
|
||||
parser.add_argument('--keep_unused_packages', action='store_true', required=False)
|
||||
|
||||
|
||||
def run():
|
||||
print("="*60)
|
||||
print(f"Python requirements guesser v{__VERSION__}")
|
||||
print("="*60)
|
||||
print(f"Guessing package versions for project '{os.getcwd()}'")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
if not validate_cwd_is_git_repo():
|
||||
print("[ERROR] py-reqs-guesser must be runned inside a git repository")
|
||||
exit(1)
|
||||
|
||||
print("Follow the steps to guess package versions based on when they were added to git.")
|
||||
|
||||
# Initialisation
|
||||
guesser = Guesser(args.force_guess, args.keep_unused_packages)
|
||||
|
||||
# Interactive guessing of packages versions
|
||||
packages = guesser.guess_package_versions()
|
||||
|
||||
# Create requirements.txt
|
||||
updated_requirements_txt_lines = get_requirements_txt_lines(packages)
|
||||
|
||||
print("\n" + "="*60 + "\n")
|
||||
print("Requirements.txt :")
|
||||
print(updated_requirements_txt_lines)
|
||||
|
||||
if args.write is None:
|
||||
print("Use the --write {path} parameter to write the new requirements file")
|
||||
else:
|
||||
if len(args.write) == 0:
|
||||
# Default location if --write toggle without {path}
|
||||
args.write = "requirements.txt"
|
||||
|
||||
write_requirements_file(updated_requirements_txt_lines, args.write)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
run()
|
122
py_requirements_guesser/guesser.py
Normal file
122
py_requirements_guesser/guesser.py
Normal file
@ -0,0 +1,122 @@
|
||||
import os
|
||||
|
||||
from .utils import get_pypi_history, get_all_imports, get_date_when_package_committed, find_version_at_date
|
||||
from .utils import get_mapping_files_from_pipreqs, get_local_modules, get_packages_from_requirements, user_response_multi_choices
|
||||
|
||||
class Guesser:
|
||||
|
||||
def __init__(self, force_guess=None, keep_unused_packages=False):
|
||||
# Retrive mapping files from https://github.com/bndr/pipreqs
|
||||
self.stdlib_list, self.import_to_package_mapping, self.package_to_import_mapping = get_mapping_files_from_pipreqs()
|
||||
|
||||
# Get local packages
|
||||
if force_guess:
|
||||
force_guess = set(force_guess.strip().split(","))
|
||||
|
||||
local_packages = get_local_modules(print_modules=True, force_guess=force_guess)
|
||||
|
||||
# Remove local_packages from the list of imports
|
||||
self.stdlib_list.update(local_packages)
|
||||
|
||||
# Retrieve all imported packages in project
|
||||
all_imported_packages = set(get_all_imports(self.stdlib_list))
|
||||
|
||||
# Retrieve packages in requirements.txt
|
||||
if os.path.exists('requirements.txt'):
|
||||
packages_in_requirements_version_map = get_packages_from_requirements('requirements.txt')
|
||||
self.packages_in_requirements = set(packages_in_requirements_version_map.keys())
|
||||
else:
|
||||
packages_in_requirements_version_map = {}
|
||||
self.packages_in_requirements = set()
|
||||
|
||||
# Merge packages in requirements.txt and imports
|
||||
self.all_packages = packages_in_requirements_version_map
|
||||
extra_packages = all_imported_packages - self.packages_in_requirements
|
||||
for extra_package in extra_packages:
|
||||
self.all_packages[extra_package] = None
|
||||
|
||||
self.keep_unused_packages = keep_unused_packages
|
||||
|
||||
|
||||
def guess_package_versions(self):
|
||||
packages = []
|
||||
for package_name, version in self.all_packages.items():
|
||||
print("\n" + "-"*40)
|
||||
print(f"PACKAGE : {package_name}")
|
||||
if version is None:
|
||||
# Reset variables
|
||||
choice = None
|
||||
date_added_via_import_str = None
|
||||
date_added_via_req_str = None
|
||||
date = None
|
||||
import_version = None
|
||||
req_version = None
|
||||
|
||||
# Pypi package to import mapping
|
||||
import_name = self.package_to_import_mapping.get(package_name, package_name)
|
||||
pypi_package_name = self.import_to_package_mapping.get(package_name, package_name)
|
||||
|
||||
# Get available versions from Pypi
|
||||
available_versions = get_pypi_history(pypi_package_name, ignore_release_candidat=True)
|
||||
|
||||
if available_versions is None:
|
||||
print(f"[INFO] Couldn't find Pypi releases for package '{package_name}', ignoring")
|
||||
continue
|
||||
|
||||
# Retrieve candidate version based on the first time the package was imported in *.py
|
||||
date_added_via_import = get_date_when_package_committed(import_name, via_requirements=False)
|
||||
if date_added_via_import is None:
|
||||
print(f" [INFO] Package '{package_name}' is defined in requirements.txt but not used (Or committed), ")
|
||||
if self.keep_unused_packages:
|
||||
print(" will attempts guessing version anyways since --keep_unused_packages is set set")
|
||||
choice = 2
|
||||
else:
|
||||
print(f"[INFO] Ignoring package '{package_name}' (Use --keep_unused_packages if you want to keep it)")
|
||||
continue
|
||||
else:
|
||||
date_added_via_import_str = date_added_via_import.strftime("%Y-%m-%d")
|
||||
import_version = find_version_at_date(available_versions, date_added_via_import)
|
||||
|
||||
# Retrieve candidate version based on the first time the package was added to requirements.txt
|
||||
if pypi_package_name.lower() in self.packages_in_requirements:
|
||||
date_added_via_req = get_date_when_package_committed(pypi_package_name, via_requirements=True)
|
||||
if date_added_via_req is not None:
|
||||
req_version = find_version_at_date(available_versions, date_added_via_req)
|
||||
date_added_via_req_str = date_added_via_req.strftime("%Y-%m-%d")
|
||||
else:
|
||||
print(f" [INFO] Package '{package_name}' was not in requirements.txt, using date of first import (Version {import_version} / {date_added_via_import_str})")
|
||||
choice = 1
|
||||
|
||||
if choice is None:
|
||||
if req_version != import_version:
|
||||
# Ask user to choose version based on either first import date or first added to requirements.txt date
|
||||
choice = user_response_multi_choices(f"Choose guessing strategy for package '{package_name}'", [
|
||||
f'{"First time the package was imported".ljust(50)} (Version {import_version} / {date_added_via_import_str})',
|
||||
f'{"When the package was added to requirements.txt".ljust(50)} (Version {req_version} / {date_added_via_req_str})'
|
||||
])
|
||||
else:
|
||||
# Both requirements.txt and first import resolve to the same version
|
||||
choice = 1
|
||||
else:
|
||||
print(f" [INFO] Package '{package_name}' was not found in requirements.txt, using date of first import (Version {import_version} / {date_added_via_import_str})")
|
||||
choice = 1
|
||||
|
||||
if choice == 2:
|
||||
version = req_version
|
||||
date = date_added_via_req_str
|
||||
else:
|
||||
version = import_version
|
||||
date = date_added_via_import_str
|
||||
|
||||
if version is not None:
|
||||
print(f"[INFO] Package '{package_name}' was first committed on {date} and was attributed version {version}")
|
||||
else:
|
||||
print(f"[ERROR] Couldn't attribute version to package '{package_name}'. Are you sure you commited the changes ?")
|
||||
continue
|
||||
|
||||
else:
|
||||
print(f"[INFO] Package '{package_name}' version is specified in requirements.txt (Version {version})")
|
||||
|
||||
packages.append((package_name, version))
|
||||
|
||||
return packages
|
332
py_requirements_guesser/utils.py
Normal file
332
py_requirements_guesser/utils.py
Normal file
@ -0,0 +1,332 @@
|
||||
import re
|
||||
import os
|
||||
import json
|
||||
import subprocess
|
||||
from datetime import datetime
|
||||
from urllib.request import urlretrieve
|
||||
from urllib.request import urlopen
|
||||
|
||||
|
||||
EXTRACT_DATE_REGEX = re.compile(r'date\s-\s(\d+)')
|
||||
LETTER_REGEX = re.compile(r'[a-zA-Z]')
|
||||
|
||||
|
||||
def get_pypi_history(package_name, ignore_release_candidat=True):
|
||||
"""
|
||||
Retrieve version release dates via Pypi JSON api
|
||||
"""
|
||||
try:
|
||||
resp = urlopen(f"https://pypi.org/pypi/{package_name}/json", timeout=20)
|
||||
except Exception as e:
|
||||
if hasattr(e, 'getcode') and e.getcode() == 404:
|
||||
return None
|
||||
else:
|
||||
print("[ERROR] Internet access is required to fetch package history from Pypi")
|
||||
exit(1)
|
||||
|
||||
resp = json.loads(resp.read())
|
||||
|
||||
versions = []
|
||||
for version, release_info_per_os in resp['releases'].items():
|
||||
# Just taking the first platform upload date for now..
|
||||
# Is it really different for other platforms ? Need to validate
|
||||
# TODO : Give appropriate version based on os and python Versions resp['info']['requires_dist'] # ['require_python']
|
||||
if len(release_info_per_os) == 0:
|
||||
continue
|
||||
|
||||
if ignore_release_candidat and LETTER_REGEX.search(version):
|
||||
continue
|
||||
|
||||
release_info = release_info_per_os[0]
|
||||
release_date = datetime.strptime(release_info['upload_time'].split("T")[0], '%Y-%m-%d')
|
||||
versions.append((version, release_date))
|
||||
|
||||
# FIXME : Do we really need to sort ? Versions should already be sorted
|
||||
return sorted(versions, key=lambda x:x[1], reverse=True)
|
||||
|
||||
|
||||
def get_all_imports(ignore_list=None):
|
||||
"""
|
||||
Retrieve all the 'import XXX' and 'from XXX' statements in the local repo
|
||||
The ignore_list parameter is used to ignore local packages
|
||||
"""
|
||||
cmd = f'grep -PRoh --include="*.py" "(?<=^import )\\w*|(?<=^from )\\w*" . | sort | uniq'
|
||||
|
||||
try:
|
||||
grep_out = subprocess.check_output(cmd, shell=True).decode().strip()
|
||||
except:
|
||||
grep_out = ""
|
||||
|
||||
if len(grep_out) == 0:
|
||||
raise Exception(f"[ERROR] couldn't find any import statement")
|
||||
|
||||
imports = [l.strip() for l in grep_out.split("\n")]
|
||||
|
||||
if ignore_list:
|
||||
return [l for l in imports if l not in ignore_list]
|
||||
|
||||
return imports
|
||||
|
||||
|
||||
def get_date_when_package_committed(package_name, via_requirements=False, first_occurence=True):
|
||||
"""
|
||||
Use git log to retrieve the date at which the package was first imported or added to the requirements.txt file (Based on commit date)
|
||||
"""
|
||||
if not via_requirements:
|
||||
search_pattern = f"^import {package_name}|^from {package_name}"
|
||||
filename = ""
|
||||
else:
|
||||
search_pattern = f"{package_name}$"
|
||||
filename = "requirements.txt"
|
||||
|
||||
# We grep for 'date' | '+ search pattern' so that we keep only commits that insert lines (+)
|
||||
cmd = f"git log -i -G '{search_pattern}' --pretty='format:date - %at' --date unix -p {filename} | grep -i '^date - \\|\\+.*{package_name}'"
|
||||
|
||||
try:
|
||||
blame_out = subprocess.check_output(cmd, shell=True).decode().strip()
|
||||
except:
|
||||
blame_out = ""
|
||||
|
||||
if len(blame_out) == 0:
|
||||
#return []
|
||||
if not via_requirements:
|
||||
msg = f"'{package_name}' is defined in requirements.txt but not used, ignoring"
|
||||
else:
|
||||
msg = f"'{package_name}' was not found in requirements.txt"
|
||||
|
||||
f"[INFO] {msg}"
|
||||
return None
|
||||
|
||||
# Remove commit that are not directly followed by '+ import' (We grepped for this in cmd)
|
||||
# This is ugly.. TODO: figure out a better way in the grep command
|
||||
dates = []
|
||||
got_plus = False
|
||||
for line in blame_out.split('\n')[::-1]:
|
||||
if line[0] == "+":
|
||||
got_plus = True
|
||||
elif got_plus:
|
||||
got_plus = False
|
||||
|
||||
matches = EXTRACT_DATE_REGEX.search(line)
|
||||
if matches:
|
||||
dates.append(datetime.fromtimestamp(int(matches.group(1))))
|
||||
else:
|
||||
raise Exception("[ERROR] while parsing git-log")
|
||||
|
||||
# Get first date where the line was added
|
||||
return sorted(dates, reverse=first_occurence)[0]
|
||||
|
||||
|
||||
def find_version_at_date(available_versions, date):
|
||||
"""
|
||||
Return version available at {date} given {available_versions}
|
||||
"""
|
||||
last_version = available_versions[0][0]
|
||||
|
||||
# FIXME : Do binary search
|
||||
for candidate_version, candidate_date in available_versions:
|
||||
if date >= candidate_date:
|
||||
return candidate_version
|
||||
else:
|
||||
last_version = candidate_version
|
||||
|
||||
# Date is older than available versions... Fallback on the oldest available version
|
||||
return last_version
|
||||
|
||||
|
||||
def get_mapping_files_from_pipreqs(tmp_path="/tmp/.py-reqs-guesser"):
|
||||
"""
|
||||
Retrieve 'import -> package' name mapping and standard lib module list
|
||||
These files come from https://github.com/bndr/pipreqs
|
||||
"""
|
||||
|
||||
skip_download = False
|
||||
|
||||
if not os.path.exists(tmp_path):
|
||||
os.mkdir(tmp_path)
|
||||
|
||||
mapping_filepath = f"{tmp_path}/mapping"
|
||||
stdlib_filepath = f"{tmp_path}/stdlib"
|
||||
|
||||
if os.path.exists(mapping_filepath) and os.path.exists(stdlib_filepath):
|
||||
# File have already been downloaded
|
||||
skip_download = True
|
||||
|
||||
if not skip_download:
|
||||
msg = "We will download a mapping file from https://github.com/bndr/pipreqs\n" \
|
||||
"Thanks to the maintainers of Pipreqs for keeping the mapping file "\
|
||||
"and the STDlib module list up to date\n" \
|
||||
f"Do you agree to downloading these files in '{tmp_path}' ?"
|
||||
|
||||
if not user_response_yes_no(msg):
|
||||
print("\n\n[ERROR]Pipreqs mapping files are required, I encourage you to inspect the code to make sure everything is safe and rerun this")
|
||||
exit(0)
|
||||
|
||||
print("")
|
||||
# FIXME : This is not really scalable...
|
||||
mapping_url = "https://raw.githubusercontent.com/bndr/pipreqs/90102acdbb23c09574d27df8bd1f568d34e0cfd3/pipreqs/mapping"
|
||||
stdlib_url = "https://raw.githubusercontent.com/bndr/pipreqs/90102acdbb23c09574d27df8bd1f568d34e0cfd3/pipreqs/stdlib"
|
||||
|
||||
try:
|
||||
urlretrieve(mapping_url, mapping_filepath)
|
||||
urlretrieve(stdlib_url, stdlib_filepath)
|
||||
except:
|
||||
print("[ERROR] Internet access is required to fetch mapping files from https://github.com/bndr/pipreqs")
|
||||
exit(1)
|
||||
|
||||
|
||||
from_import_to_package_mapping = {}
|
||||
from_package_to_import_mapping = {}
|
||||
with open(mapping_filepath, 'r') as f:
|
||||
for line in f.readlines():
|
||||
import_name, package_name = line.strip().split(":")
|
||||
|
||||
from_import_to_package_mapping[import_name] = package_name
|
||||
from_package_to_import_mapping[package_name] = import_name
|
||||
|
||||
with open(stdlib_filepath, 'r') as f:
|
||||
stdlib = set([l.strip() for l in f.readlines()])
|
||||
|
||||
return stdlib, from_import_to_package_mapping, from_package_to_import_mapping
|
||||
|
||||
|
||||
def get_packages_from_requirements(filepath):
|
||||
"""
|
||||
Retrieve package list from 'requirements.txt'
|
||||
"""
|
||||
# TODO : Handle multiple version conditions
|
||||
# TODO : Handle greater than (>). If version contains >, should take the greatest available version at that date.
|
||||
with open(filepath, 'r') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
split_reg = re.compile(r'==|<=|>=|<|>')
|
||||
|
||||
packages = {}
|
||||
|
||||
for line in lines:
|
||||
splitted = re.split(split_reg, line.strip())
|
||||
if len(splitted) > 1:
|
||||
version = splitted[-1]
|
||||
else:
|
||||
version = None
|
||||
|
||||
packages[splitted[0].lower()] = version
|
||||
|
||||
return packages
|
||||
|
||||
|
||||
def get_local_modules(print_modules=False, force_guess=None):
|
||||
"""
|
||||
Gather list of the local python modules so we don't query pypi for those modules
|
||||
Lets say we have the following file structure :
|
||||
/project
|
||||
- main.py
|
||||
- logger.py
|
||||
/utils
|
||||
- common.py
|
||||
common.py will be imported in main.py using 'from utils import common'
|
||||
We therefore need to include the folder 'utils' in our exclusion list
|
||||
In this example, the exclusion list is [main, logger, utils]
|
||||
|
||||
print_modules: Control console printing
|
||||
force_guess: In case of conflict (Import packageX and local file named packageX.py), this list is used to force version guessing
|
||||
"""
|
||||
if force_guess is None:
|
||||
force_guess = set()
|
||||
|
||||
file_paths = subprocess.check_output('find . -name "*.py" -printf "%P\\n"', shell=True).decode().strip().split("\n")
|
||||
|
||||
modules = set()
|
||||
|
||||
for file_path in file_paths:
|
||||
module = file_path.split('/')[0]
|
||||
if '.py' in module:
|
||||
module = module[:-3]
|
||||
|
||||
if module not in force_guess:
|
||||
modules.add(module)
|
||||
|
||||
if print_modules:
|
||||
print("\nWe detected the following local project modules :")
|
||||
for module in modules:
|
||||
print(" " + module)
|
||||
print("We won't attempt to guess version for these packages (local files)")
|
||||
print("In case of conflict, this can be overriden using --force_guess {package1},{package2},...")
|
||||
|
||||
return modules
|
||||
|
||||
|
||||
def validate_cwd_is_git_repo():
|
||||
""""
|
||||
Verify that the current working directory is inside a git repository
|
||||
"""
|
||||
try:
|
||||
subprocess.check_output("git rev-parse --is-inside-work-tree 2>/dev/null", shell=True)
|
||||
except:
|
||||
# git rev-parse return non-zero exit code if not in repo
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def user_response_multi_choices(message, choices):
|
||||
"""
|
||||
Multiple choice Menu prompt
|
||||
"""
|
||||
print(message)
|
||||
for i, choice in enumerate(choices):
|
||||
print(f' {i+1}. {choice}')
|
||||
|
||||
|
||||
nb_choices = len(choices)
|
||||
resp = input(f'Choose option [1-{nb_choices}] : ')
|
||||
|
||||
if not resp.isdigit() or int(resp) not in range(1,nb_choices+1):
|
||||
print("")
|
||||
return user_response_multi_choices(message, choices)
|
||||
|
||||
return int(resp)
|
||||
|
||||
|
||||
def user_response_yes_no(message):
|
||||
""""
|
||||
Yes/No Menu prompt
|
||||
"""
|
||||
resp = input(message + ' [Y/n] : ').lower()
|
||||
|
||||
if resp not in ['y', 'n']:
|
||||
print("")
|
||||
return user_response_yes_no(message)
|
||||
|
||||
return resp == 'y'
|
||||
|
||||
|
||||
def get_date_last_modified_python_file():
|
||||
"""
|
||||
Use git log to retrieve the last time a change to a .py file was committed to the repo
|
||||
"""
|
||||
timestamp = subprocess.check_output('git log -n 1 --all --pretty="format:%ct" -- "*.py"', shell=True).decode()
|
||||
|
||||
if len(timestamp) == 0:
|
||||
return None
|
||||
else:
|
||||
return datetime.fromtimestamp(int(timestamp))
|
||||
|
||||
|
||||
def get_requirements_txt_lines(packages):
|
||||
requirements_txt = ""
|
||||
for package_name, version in sorted(packages, key=lambda x:x[0]):
|
||||
requirements_txt += f"{package_name}=={version}\n"
|
||||
|
||||
return requirements_txt
|
||||
|
||||
|
||||
def write_requirements_file(package_lines, filepath):
|
||||
print(f"Writing requirements to file {filepath}")
|
||||
|
||||
if os.path.exists(filepath) and \
|
||||
not user_response_yes_no(f"File {filepath} already exist, are you sure you want to overwrite it ?"):
|
||||
exit(0)
|
||||
|
||||
with open(filepath, 'w') as f:
|
||||
f.write(package_lines)
|
27
setup.py
Normal file
27
setup.py
Normal file
@ -0,0 +1,27 @@
|
||||
import setuptools
|
||||
|
||||
with open("README.md", "r") as fh:
|
||||
long_description = fh.read()
|
||||
|
||||
setuptools.setup(
|
||||
name="py-requirements-guesser", # This is the name of the package
|
||||
version="0.0.1", # The initial release version
|
||||
author="Jerome Abdelnour", # Full name of the author
|
||||
description="Guess requirements.txt versions based on Git history",
|
||||
long_description=long_description, # Long description read from the the readme file
|
||||
long_description_content_type="text/markdown",
|
||||
url="https://github.com/j3rome/py-requirements-guesser",
|
||||
classifiers=[
|
||||
"Programming Language :: Python :: 3",
|
||||
"License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
|
||||
"Operating System :: OS Independent",
|
||||
], # Information to filter the project on PyPi website
|
||||
python_requires='>=3.6', # Minimum version requirement of the package
|
||||
py_modules=['py_requirements_guesser'],
|
||||
packages=['py_requirements_guesser'],
|
||||
entry_points={
|
||||
'console_scripts': [
|
||||
'py-requirements-guesser=py_requirements_guesser.cli:run'
|
||||
]
|
||||
}
|
||||
)
|
172
utils.py
172
utils.py
@ -1,172 +0,0 @@
|
||||
import re
|
||||
import os
|
||||
import subprocess
|
||||
from datetime import datetime
|
||||
from urllib.request import urlretrieve
|
||||
|
||||
def user_response_multi_choices(message, choices):
|
||||
print(message)
|
||||
for i, choice in enumerate(choices):
|
||||
print(f' {i+1}. {choice}')
|
||||
|
||||
|
||||
nb_choices = len(choices)
|
||||
resp = input(f'Choose option [1-{nb_choices}] : ')
|
||||
|
||||
if not resp.isdigit() or int(resp) not in range(1,nb_choices+1):
|
||||
print("")
|
||||
return user_response_multi_choices(message, choices)
|
||||
|
||||
return int(resp)
|
||||
|
||||
|
||||
def user_response_yes_no(message):
|
||||
resp = input(message + ' [Y/n] : ').lower()
|
||||
|
||||
if resp not in ['y', 'n']:
|
||||
print("")
|
||||
return user_response_yes_no(message)
|
||||
|
||||
return resp == 'y'
|
||||
|
||||
|
||||
def get_mapping_files_from_pipreqs(tmp_path="/tmp/.py-req-guesser"):
|
||||
"""
|
||||
Retrieve import to package name mapping file and standard lib module list
|
||||
This list comes from https://github.com/bndr/pipreqs
|
||||
"""
|
||||
|
||||
skip_download = False
|
||||
|
||||
if not os.path.exists(tmp_path):
|
||||
os.mkdir(tmp_path)
|
||||
|
||||
mapping_filepath = f"{tmp_path}/mapping"
|
||||
stdlib_filepath = f"{tmp_path}/stdlib"
|
||||
|
||||
if os.path.exists(mapping_filepath) and os.path.exists(stdlib_filepath):
|
||||
# File have already been downloaded
|
||||
skip_download = True
|
||||
|
||||
if not skip_download:
|
||||
msg = "We will download a mapping file from https://github.com/bndr/pipreqs\n" \
|
||||
"Thanks to the maintainers of Pipreqs for keeping the mapping file "\
|
||||
"and the STDlib module list up to date\n" \
|
||||
f"Do you agree to downloading these files in '{tmp_path}' ?"
|
||||
|
||||
if not user_response_yes_no(msg):
|
||||
print("\n\n[ERROR]Pipreqs mapping files are required, I encourage you to inspect the code to make sure everything is safe and rerun this")
|
||||
exit(0)
|
||||
|
||||
print("")
|
||||
# FIXME : This is not really scalable...
|
||||
mapping_url = "https://raw.githubusercontent.com/bndr/pipreqs/90102acdbb23c09574d27df8bd1f568d34e0cfd3/pipreqs/mapping"
|
||||
stdlib_url = "https://raw.githubusercontent.com/bndr/pipreqs/90102acdbb23c09574d27df8bd1f568d34e0cfd3/pipreqs/stdlib"
|
||||
|
||||
try:
|
||||
urlretrieve(mapping_url, mapping_filepath)
|
||||
urlretrieve(stdlib_url, stdlib_filepath)
|
||||
except:
|
||||
print("[ERROR] Internet access is required to fetch mapping files from https://github.com/bndr/pipreqs")
|
||||
exit(1)
|
||||
|
||||
|
||||
from_import_to_package_mapping = {}
|
||||
from_package_to_import_mapping = {}
|
||||
with open(mapping_filepath, 'r') as f:
|
||||
for line in f.readlines():
|
||||
import_name, package_name = line.strip().split(":")
|
||||
|
||||
from_import_to_package_mapping[import_name] = package_name
|
||||
from_package_to_import_mapping[package_name] = import_name
|
||||
|
||||
with open(stdlib_filepath, 'r') as f:
|
||||
stdlib = set([l.strip() for l in f.readlines()])
|
||||
|
||||
return stdlib, from_import_to_package_mapping, from_package_to_import_mapping
|
||||
|
||||
|
||||
|
||||
def load_packages_from_requirements(filepath):
|
||||
# TODO : Handle when multiple version conditions
|
||||
# TODO : Handle greater than (>). If version contains >, should take the greatest available version at the date. Should fit with minor versions ?
|
||||
with open(filepath, 'r') as f:
|
||||
lines = f.readlines()
|
||||
|
||||
split_reg = re.compile(r'==|<=|>=|<|>')
|
||||
|
||||
packages = {}
|
||||
|
||||
for line in lines:
|
||||
splitted = re.split(split_reg, line.strip())
|
||||
if len(splitted) > 1:
|
||||
version = splitted[-1]
|
||||
else:
|
||||
version = None
|
||||
|
||||
packages[splitted[0].lower()] = version
|
||||
|
||||
return packages
|
||||
|
||||
|
||||
def get_local_modules(print_modules=False, force_guess=None):
|
||||
"""
|
||||
Gather list of the local python modules so we don't query pypi for those modules
|
||||
Lets say we have the following file structure :
|
||||
/project
|
||||
- main.py
|
||||
/utils
|
||||
- common.py
|
||||
common.py will be imported in main.py using 'from utils import common'
|
||||
We therefore need to include the folder 'utils' in our exclusion list
|
||||
"""
|
||||
if force_guess is None:
|
||||
force_guess = set()
|
||||
|
||||
file_paths = subprocess.check_output('find . -name "*.py" -printf "%P\\n"', shell=True).decode().strip().split("\n")
|
||||
|
||||
modules = set()
|
||||
|
||||
for file_path in file_paths:
|
||||
module = file_path.split('/')[0]
|
||||
if '.py' in module:
|
||||
module = module[:-3]
|
||||
|
||||
if module not in force_guess:
|
||||
modules.add(module)
|
||||
|
||||
if print_modules:
|
||||
print("\nWe detected the following local project modules :")
|
||||
for module in modules:
|
||||
print(" " + module)
|
||||
print("We won't attempt to guess version for these packages (local files)")
|
||||
print("In case of conflict, this can be overriden using --force_guess {package1},{package2},...")
|
||||
|
||||
return modules
|
||||
|
||||
|
||||
def get_date_last_modified_python_file():
|
||||
timestamp = subprocess.check_output('git log -n 1 --all --pretty="format:%ct" -- "*.py"', shell=True).decode()
|
||||
|
||||
if len(timestamp) == 0:
|
||||
return None
|
||||
else:
|
||||
return datetime.fromtimestamp(int(timestamp))
|
||||
|
||||
|
||||
def validate_cwd_is_git_repo():
|
||||
try:
|
||||
subprocess.check_output("git rev-parse --is-inside-work-tree 2>/dev/null", shell=True)
|
||||
except:
|
||||
# git rev-parse return non-zero exit code if not in repo
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
def detect_os():
|
||||
pass
|
||||
|
||||
|
||||
def get_python_version():
|
||||
pass
|
Loading…
Reference in New Issue
Block a user