解决Python KeyError: 'abundance'错误

2025/08/05·StepFun can make mistakes, check important info.

Traceback (most recent call last): File "D:\computer learning\.venv\lib\site-packages\pandas\core\indexes\base.py", line 3812, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 7096, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'abundance' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "d:\computer learning\.venv\ms_analysis.py", line 69, in <module> result = find_molecular_ion("D:/computer learning/.venv/mass_data.xlsx") File "d:\computer learning\.venv\ms_analysis.py", line 10, in find_molecular_ion abundance_dict = df.set_index('mz_int')['abundance'].to_dict() File "D:\computer learning\.venv\lib\site-packages\pandas\core\frame.py", line 4107, in __getitem__ indexer = self.columns.get_loc(key) File "D:\computer learning\.venv\lib\site-packages\pandas\core\indexes\base.py", line 3819, in get_loc raise KeyError(key) from err KeyError: 'abundance'

Reason

思考过程

We are getting a KeyError: 'abundance'. This means that the column named 'abundance' is not found in the Excel file.
Let's check the possible reasons:

The column name in the Excel file might be different (case sensitivity? spaces?).
The Excel file might have a header row that is not read as header.

Steps to fix:

Verify the column names in the Excel file. They should exactly match (case-sensitive) the strings 'm/z' and 'abundance'.
If the column names are different, we can adjust the code to match or rename the columns in the DataFrame.

Alternatively, we can make the code more robust by:
a) Printing the column names to see what we have.
b) Adjusting the column names in the code to match the actual headers.

Let's modify the code to:

Print the column names if an error occurs.
Use a flexible approach to handle possible variations (like capital letters, extra spaces).

However, since the error is about 'abundance', let's focus on that.

Proposed fix:

Check the actual column names in the Excel file.
If the column is named differently (e.g., 'Abundance', 'Abundance (%)', etc.), we need to adjust.

Alternatively, we can try to make the code more resilient by:

Using case-insensitive matching for column names.
Allowing for common variations (like '%' in the abundance column).

But note: the problem says the Excel has A列 (column A) as m/z and B列 as abundance.

So, if the Excel file has exactly two columns with headers 'm/z' and 'abundance', then it should work.

How about we try:

Instead of assuming the column names, we can try to use the first two columns and assign the headers we expect.

But the error says KeyError: 'abundance'. So let's assume the column might be named differently.

We can do:

Option 1: Change the code to use the column index (0 for first column, 1 for second) if the header is not as expected.

Option 2: Let the user specify the column names or check the actual headers.

Since we are the developer, we can adjust the code to be more flexible.

Let me adjust the code:

Step 1: Read the Excel file without assuming headers.
Step 2: Check the headers in the file.

We can do:

df = pd.read_excel(excel_path, header=0) # assuming first row is header

Then, we can print the columns to see.

But we are in an error situation, so we can add a try-except and print the columns.

Alternatively, we can make the code more robust by:

Using a function to normalize column names (lowercase, remove spaces and special characters) and then match.

However, for simplicity and since this is a specific case, let's assume the abundance column might have a different name.

Common variations: 'abundance', 'Abundance', 'Abundance (%)', 'Intensity', etc.

We can try to search for a column that contains the string 'abund' or 'intensity' in a case-insensitive way.

But note: the problem states the table has two columns: A (m/z) and B (abundance). So we can also assume the second column is abundance.

How about we set:

df.columns = ['m/z', 'abundance'] # if we know the order

But what if the Excel file has more columns? Then we cannot assume.

Alternatively, we can do:

if 'abundance' not in df.columns:
# try to find a column that matches case-insensitively
for col in df.columns:
if col.lower().find('abund') >= 0 or col.lower().find('intensity') >=0:
# use this column as abundance
df.rename(columns={col: 'abundance'}, inplace=True)
break
else:
# if no such column, then use the second column?
if len(df.columns) >= 2:
df.columns = ['m/z', 'abundance']
else:
raise Exception("Could not find an 'abundance' column")

But this might be overkill.

Since the problem says the Excel file has two columns: A and B, we can assume:

Column A: m/z
Column B: abundance

So we can set the column names by order:

df = pd.read_excel(excel_path, header=0)
df.columns = ['m/z', 'abundance'] # if there are exactly two columns

But if there are more than two columns, we need to know which is which.

Let's change the approach: we will read the Excel file and then try to use the first two columns, renaming them to 'm/z' and 'abundance'.

However, the error might be because the Excel file has more than two columns? We don't know.

Let's modify the code to:

Read the Excel file.
Print the column names for debugging.
Then, use the first two columns and rename them to 'm/z' and 'abundance'.

But we are in a production scenario? We don't want to print for every run.

Alternatively, we can have a configuration step.

However, to fix the immediate issue, I will change the code to:

df = pd.read_excel(excel_path)

If the column names are not as expected, we try to use the first two columns and rename

if len(df.columns) < 2:
raise Exception("Excel file must have at least two columns")

Rename the first two columns

df = df.rename(columns={df.columns[0]: 'm/z', df.columns[1]: 'abundance'})

Then, proceed as before.

Let's update the function accordingly.

Revised code for the function:

python

But note: the original code also used a column called 'm/z'. So we are renaming the first column to 'm/z' and the second to 'abundance'.

This should work if the data is in the first two columns.

Let me update the entire function accordingly.

Alternatively, we can also handle the case where the user might have more than two columns and we want to use the first two.

We'll do:

df = pd.read_excel(excel_path)

Use only the first two columns

if len(df.columns) > 2:
df = df.iloc[:, 0:2] # take first two columns
df.columns = ['m/z', 'abundance']

This way, we ignore extra columns.

But the problem says the Excel has A and B columns, so we can assume the first two.

Let's go with this approach.

Revised code:

python

This should fix the KeyError because we are forcing the column names to be 'm/z' and 'abundance'.

Let me rewrite the entire function with this fix and the rest of the code.

But note: the original code had an error at the line:
abundance_dict = df.set_index('mz_int')['abundance'].to_dict()

By renaming the second column to 'abundance', we ensure that the column exists.

We'll also need to create the 'mz_int' column, which we are doing.

Let's run it again.

If the Excel file has the first column as m/z and the second as abundance (even if the header is different), we are renaming them to the standard names.

This should work.

But note: what if the m/z column has non-integer values? We are rounding to integer. That is acceptable because the m/z in the problem are integers (like M+1, M+23, etc.).

We are rounding to the nearest integer and converting to int.

This is consistent with the problem.

So the fixed code for the function is:

python

This version should be more robust to variations in the Excel file column names and structure.

Let me know if you have any further issues.

这个错误表明你的 Excel 文件中缺少代码预期的列名 "abundance"。根据错误信息，问题出在以下几行：

python

解决方案：

方法 1：检查并修正 Excel 列名（推荐）

打开你的 Excel 文件
确认第二列（B列）的标题名称为 "abundance"（注意大小