apps.bazel_parser.cli
Parse bazel query outputs
A larger system description: - inputs:
bazel query //… –output proto > query_result.pb - the full dependency tree
bazel test //… –build_event_binary_file=test_all.pb - bazel run //utils:bep_reader < test_all.pb - the execution time related to each test target
git_utils.get_file_commit_map_from_follow - how files have changed over time, can be used to generate
probabilities of files changing in the future
intermediates: - representation for source files and bazel together
outputs: - test targets:
likelihood of executing - expected value of runtime
source files: - cost in execution time of modification - expected cost of file change (based on probability of change * cost)
graph with the values above, we could take any set of file inputs and describe cost
graph that we could identify overly depended upon things
- XXX:
bazel query –keep_going –noimplicit_deps –output proto “deps(//…)” is much bigger than “//…” alone, compare what the differences are
- git log –since=”10 years ago” –name-only –pretty=format: | sort
- uniq -c | sort -nr
this is much faster
could identify renames via: - git log –since=”1 month ago” –name-status –pretty=format:
grep -P ‘R[0-9]*t’ | awk ‘{print $2, “->”, $3}’then correct
can get commit association via - git log –since=”1 month ago” –name-status –pretty=format:”%H”
statuses are A,M,D,Rddd
``` # Regex pattern to match the git log output pattern = r”^([AMD])s+(.+?)(s*->s*(.+))?$|^R(d+)s+(.+?)s*->s*(.+)$” # Parse each line using the regex for line in git_log_output.strip().split(’n’):
match = re.match(pattern, line.strip()) if match:
- if match.group(1): # For A, M, D statuses
change_type = match.group(1) old_file = match.group(2) new_file = match.group(4) if match.group(4) else None print(f”Change type: {change_type}, Old file: {old_file}, “
f”New file: {new_file}”)
- elif match.group(5): # For R status (renames)
change_type = ‘R’ similarity_index = match.group(5) old_file = match.group(6) new_file = match.group(7) print(f”Change type: {change_type}, Similarity index:”
f” {similarity_index}, Old file: {old_file}, New file:” f” {new_file}”)
Example Script:
repo_dir=`pwd` file_commit_pb=$repo_dir/file_commit.pb query_pb=$repo_dir/s_result.pb bep_pb=$repo_dir/test_all.pb out_gml=$repo_dir/my.gml out_csv=$repo_dir/my.csv out_html=$repo_dir/my.html
# Prepare data bazel query “//… - //docs/… - //third_party/bazel/…” –output proto
> $query_pb
bazel test //… –build_event_binary_file=$bep_pb bazel run //apps/bazel_parser –output_groups=-mypy – git-capture –repo-dir
$repo_dir –days-ago 400 –file-commit-pb $file_commit_pb
# Separate step if we want build timing data bazel clean bazel build –noremote_accept_cached
–experimental_execution_log_compact_file=exec_log.pb.zst –generate_json_trace_profile –profile=example_profile_new.json //…
# Would then need to process the exec_log.pb.zst file to get timing from it and # then add to the other timing information
# Process and visualize the data bazel run //apps/bazel_parser –output_groups=-mypy – process
–file-commit-pb $file_commit_pb –query-pb $query_pb –bep-pb $bep_pb –out-gml $out_gml –out-csv $out_csv
- bazel run //apps/bazel_parser –output_groups=-mypy – visualize
–gml $out_gml –out-html $out_html
Attributes
Classes
Functions
|
|
|
|
|
|
|
|
|
|
|
|
|
Module Contents
- apps.bazel_parser.cli.load_config(config_yaml_path: pathlib.Path, overrides: dict) Config[source]
- Parameters:
config_yaml_path (pathlib.Path)
overrides (dict)
- Return type:
- apps.bazel_parser.cli.get_config(config_file: pathlib.Path | None, days_ago: int | None) Config[source]
- Parameters:
config_file (pathlib.Path | None)
days_ago (int | None)
- Return type:
- apps.bazel_parser.cli.git_capture(repo_dir: pathlib.Path, days_ago: int, file_commit_pb: pathlib.Path) None[source]
- Parameters:
repo_dir (pathlib.Path)
days_ago (int)
file_commit_pb (pathlib.Path)
- Return type:
None
- apps.bazel_parser.cli.process(query_pb: pathlib.Path, bep_pb: pathlib.Path, file_commit_pb: pathlib.Path, out_gml: pathlib.Path, out_csv: pathlib.Path, config_file: pathlib.Path | None) None[source]
- Parameters:
query_pb (pathlib.Path)
bep_pb (pathlib.Path)
file_commit_pb (pathlib.Path)
out_gml (pathlib.Path)
out_csv (pathlib.Path)
config_file (pathlib.Path | None)
- Return type:
None
- apps.bazel_parser.cli.full(repo_dir: pathlib.Path, days_ago: int | None, config_file: pathlib.Path | None, out_gml: pathlib.Path | None, out_csv: pathlib.Path | None) None[source]
- Parameters:
repo_dir (pathlib.Path)
days_ago (int | None)
config_file (pathlib.Path | None)
out_gml (pathlib.Path | None)
out_csv (pathlib.Path | None)
- Return type:
None